Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0) CMake Error at /usr/lib64/cmake/Tensile/TensileConfig.cmake:277 (message): Error creating Tensile library: 255 Call Stack (most recent call first): library/src/CMakeLists.txt:89 (TensileCreateLibraryFiles) Reproducible: Always
Created attachment 923565 [details] emerge --info
Created attachment 923566 [details] sci-libs\:rocBLAS-5.7.1-r2\:20250404-112432.log
From what I can tell, this is because the Clang 19 assembler is reporting that gfx803 is incapable of WMMA, while the cached value is that it supports WMMA. I don't know what changed to create this inconsistency, though. Looking at the rocWMMA package, it appears gfx803 was never supported in the first place; if this is true, patching the cached/expected value in dev-util/Tensile should fix the configuration.
I was looking in the wrong place to look up which architecture is mismatching: I build mine for gfx803, but the check is failing for gfx1100 and gfx1101 at least, which are indeed supported by rocWMMA.
Looking at the code sent to the assembler: v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7] with the command line: /usr/lib/llvm/19/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 - the test fails with the output: <stdin>:1:1: error: operands are not valid for this GPU or mode Comparing this to the output for gfx803 or gfx900, which truly do not support WMMA: <stdin>:1:1: error: instruction not supported on this GPU It's evident that the test provided is malformed, and maybe the appropriate patch would assign operands for this test in dev-util/Tensile that are supported for that instruction across all microarchitectures. The content of this test can be found in Common.py within the Tensile package.
When I try to assemble the same test with Clang 18, it emits a binary, so the test is malformed only for Clang 19 (and later?)
Temporarily removing Clang 19 from my system while building rocBLAS works around the problem. This offers an alternative: while patching the test in dev-util/Tensile may be best if an attempt at supporting a newer Clang is desired, fixing the build may just be a matter of forcing an older Clang to be used. Until then, the workaround is relatively low-effort, as long as there's a binpkg saved for the newer Clang, to restore it after installing rocBLAS.