Summary: | >dev-db/mariadb-10.2.22-r2 crashes during startup on ppc64-linux | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Fabian Groffen <grobian> |
Component: | Current packages | Assignee: | Gentoo Linux MySQL bugs team <mysql-bugs> |
Status: | CONFIRMED --- | ||
Severity: | normal | CC: | daniel, grobian, hydrapolic |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | PPC64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- |
Description
Fabian Groffen
![]() The 10.3 and 10.4 don't actually check the hardware capabilities. I did assume a lack of < POWER7 users I'm sorry. 10.5 does check this correctly if that is acceptable to you. https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474 Hi Daniel, thanks for checking in. Since what version of 10.5 should this be fixed? I tried 10.5.8 and got the same crash. https://github.com/MariaDB/server/commit/ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7. What does `LD_SHOW_AUXV=1 /bin/true` show? This assumption is definitely wrong: https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122 I think I was aiming for a would be available if hardware upgraded approach. Hence still like to see the POWER4 HWCAP2 flags. I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the ISA (like include/my_cpu.h). But with info, will look closer next week. (In reply to Daniel Black from comment #3) > https://github.com/MariaDB/server/commit/ > ccbe6bb6fc3cbe31e74404723f4ab78f7c530950 so should be 10.5.7. > > What does `LD_SHOW_AUXV=1 /bin/true` show? % env LD_SHOW_AUXV=1 /bin/true AT_DCACHEBSIZE: 0x80 AT_ICACHEBSIZE: 0x80 AT_UCACHEBSIZE: 0x0 AT_SYSINFO_EHDR: 0x3fffa51c0000 AT_L1I_CACHESIZE: 65536 AT_L1I_CACHEGEOMETRY: 128B line size, Directly mapped AT_L1D_CACHESIZE: 32768 AT_L1D_CACHEGEOMETRY: 128B line size, 2-way set associative AT_L2_CACHESIZE: 1048576 AT_L2_CACHEGEOMETRY: 128B line size, 8-way set associative AT_L3_CACHESIZE: 0 AT_L3_CACHEGEOMETRY: Unknown line size, Unknown associativity AT_HWCAP: power4 mmu fpu altivec ppc64 ppc32 AT_PAGESZ: 4096 AT_CLKTCK: 100 AT_PHDR: 0x12cbd7040 AT_PHENT: 56 AT_PHNUM: 8 AT_BASE: 0x3fffa5178000 AT_FLAGS: 0x0 AT_ENTRY: 0x12cbf67c8 AT_UID: 501 AT_EUID: 501 AT_GID: 500 AT_EGID: 500 AT_SECURE: 0 AT_RANDOM: 0x3fffd5cac7b2 AT_HWCAP2: AT_EXECFN: /bin/true AT_PLATFORM: ppc970 AT_BASE_PLATFORM: ppc970 > This assumption is definitely wrong: > https://github.com/MariaDB/server/blob/10.6/mysys/CMakeLists.txt#L118-L122 > > I think I was aiming for a would be available if hardware upgraded approach. > Hence still like to see the POWER4 HWCAP2 flags. See above. > I can replace the HAVE_POWER8 by ifdef _ARCH_PWR8 which is defined in the > ISA (like include/my_cpu.h). But with info, will look closer next week. I can live with a less optimised version if that is easier. This is an old box, but still running a (master) instance fine, it doesn't have to perform at its top. Guess I shouldn't be surprised HWCAP2 is blank given it all seems to be later CPUs https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/include/uapi/asm/cputable.h Even if getauxval returns 0 for not supported it should fail https://man7.org/linux/man-pages/man3/getauxval.3.html Feels like something isn't behaving to spec. Do you mind trying to trace the C implementation of getauxval(AT_HWCAP2)? not sure if this was what you had in mind: % cat x.c #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/auxv.h> int main() { unsigned long x = getauxval(AT_HWCAP2); printf("HWCAP2: %lu, (%s)\n", x, strerror(errno)); x = getauxval(123456); printf("garbage: %lu, (%s)\n", x, strerror(errno)); } % gcc -o x x.c % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) % If I try to interpret the man-page it just means that the value is 0 for real for HWCAP2, not because of an error. yep, Thanks. That looks like what I expected/intended. I've no idea why https://github.com/MariaDB/server/blob/10.5/mysys/crc32/crc32c.cc#L474 is returning true when HWCAP2 is 0. (hopefully) last guess. Add `-maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector` cflags to your test program and see if somehow the getauxvec is being optimized away/different result. (sorry for the late reply) % gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) However, after digging somewhat more: % cat x.c #include <stdio.h> #include <string.h> #include <errno.h> #include <sys/auxv.h> int main() { unsigned long x = getauxval(AT_HWCAP2); printf("HWCAP2: %lu, (%s)\n", x, strerror(errno)); x = getauxval(123456); printf("garbage: %lu, (%s)\n", x, strerror(errno)); #if __linux__ printf("we're on Linux\n"); #if defined(__powerpc64__) printf("arch is ppc64\n"); x = getauxval(AT_HWCAP2); printf ("PPC_FEATURE2_VEC_CRYPTO: %s (%lu & %lu)\n", x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no", x, PPC_FEATURE2_VEC_CRYPTO); if (getauxval(AT_HWCAP2) & PPC_FEATURE2_VEC_CRYPTO) printf("all-in-one-statement detects PPC_FEATURE2_VEC_CRYPTO\n"); #endif #endif } % gcc -maltivec -mvsx -mpower8-vector -mcrypto -mpower8-vector -o x x.c x.c: In function ‘main’: x.c:18:8: error: ‘PPC_FEATURE2_VEC_CRYPTO’ undeclared (first use in this function); did you mean ‘PPC_FEATURE2_HAS_VEC_CRYPTO’? 18 | x & PPC_FEATURE2_VEC_CRYPTO ? "yes" : "no", | ^~~~~~~~~~~~~~~~~~~~~~~ | PPC_FEATURE2_HAS_VEC_CRYPTO x.c:18:8: note: each undeclared identifier is reported only once for each function it appears in % grep PPC_FEATURE2_VEC_CRYPTO /usr/include/bits/hwcap.h % so is there something defining PPC_FEATURE2_VEC_CRYPTO to 0 via some compatability code or something? That'd obviously make the condition true. ok, I copied in #ifndef PPC_FEATURE2_VEC_CRYPTO #define PPC_FEATURE2_VEC_CRYPTO 0x02000000 #endif from the top of the file. now we get: % ./x HWCAP2: 0, (Success) garbage: 0, (No such file or directory) we're on Linux arch is ppc64 PPC_FEATURE2_VEC_CRYPTO: no (0 & 33554432) so back to square one. I just noticed the code is guarded by HAVE_POWER8 and HAS_ALTIVEC guards. The latter I think would be ok, but POWER8 shouldn't, for this is a POWER4 at most and is suspicious. I cannot seem to find what sets HAVE_POWER8, could it be that this is where the unexpected results come from (e.g getauxval never be called)? it seems that the cmake files are a bit blunt: include(CheckCCompilerFlag) # ppc64 or ppc64le if(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") CHECK_C_COMPILER_FLAG("-maltivec" HAS_ALTIVEC) if(HAS_ALTIVEC) message(STATUS " HAS_ALTIVEC yes") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -maltivec") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -maltivec") endif(HAS_ALTIVEC) if(NOT CMAKE_C_FLAGS MATCHES "m(cpu|tune)") set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcpu=power8") endif() if(NOT CMAKE_CXX_FLAGS MATCHES "m(cpu|tune)") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcpu=power8") endif() ADD_DEFINITIONS(-DHAVE_POWER8 -DHAS_ALTIVEC) endif(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") likewise: IF(CMAKE_SYSTEM_PROCESSOR MATCHES "ppc64") SET(HAVE_CRC32_VPMSUM 1) SET(CRC32_LIBRARY crc32-vpmsum) ADD_SUBDIRECTORY(extra/crc32-vpmsum) ENDIF() In other words, if it is ppc64, it is assumed to be HAVE_POWER8. I think PowerPC64 has Altivec alright, but the HAVE_POWER8 should probably be pulled into a conditional of some sort? I've dropped -DHAVE_POWER8 and -DHAVE_CRC32_VPMSUM from the cmake file now in src_prepare to test on my host: # bug 756814 sed -i -e 's/-DHAVE_POWER8 //' \ "${S}"/storage/rocksdb/build_rocksdb.cmake || die sed -i -e 's/ppc64/got-no-power8/' \ "${S}"/cmake/crc32.cmake || die This give me 2021-01-05 19:06:29 0 [Note] InnoDB: Using generic crc32 instructions and a non-crashing mysqld at some first quick tests. Is this still happening with mariadb-10.5.10? I think some PPC patches landed upstream... It crashes on startup with Illegal Instruction. But, it writes to the log: 2021-05-29 15:39:28 0 [Note] InnoDB: Using generic crc32 instructions so this particular problem is solved in that version, mysql still isn't usable. |