sci-libs/rocBLAS fails on tensile command failure with Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0). Not sure if rocm related, but I have an AMD card so... Reproducible: Always Steps to Reproduce: 1. emerge sci-libs/rocBLAS with rocm 2. 3. Actual Results: Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0). There is an online issue [1] and a proposed fix [2] that allowed me to workaround this issue. 1 - https://github.com/ROCm/Tensile/issues/1757 2 - https://github.com/ROCm/Tensile/pull/1898/commits/4f7f6b6523b3575b4229e8713383166df0b121a0 I applied the same(-ish) logic in /usr/lib/python3.12/site-packages/Tensile/Common.py#2020 and it passed that issue.
Created attachment 903727 [details] build log
Latest Tensile release already has a fix...
Created attachment 904115 [details] sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the TENSILE_ROCM_ASSEMBLER Hi there, I'm having a similar problem, it's not clear if you're using Clang++ 18 or 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if there's a 19 then by default it should find the 19 first to use as its assembler. If you are also using 19 by default, then this problem may have nothing to do with the bug you posted above. The problem I'm having here is that the following command works fine when using clang++ 18: > echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 - But when I use clang++ 19, I get the error > <stdin>:1:1: error: operands are not valid for this GPU or mode > v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7] > ^ This is the only difference in AsmCaps.py with `derivedAsmCaps`, the "HasWMMA" capability, for gfx1100. After manually specifying clang++ 18, I solved the problem (Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered here: > env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS
(In reply to Ryan Tsien from comment #3) > Created attachment 904115 [details] > sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the > TENSILE_ROCM_ASSEMBLER > > Hi there, > > I'm having a similar problem, it's not clear if you're using Clang++ 18 or > 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if > there's a 19 then by default it should find the 19 first to use as its > assembler. > > If you are also using 19 by default, then this problem may have nothing to > do with the bug you posted above. The problem I'm having here is that the > following command works fine when using clang++ 18: > > > echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 - > > But when I use clang++ 19, I get the error > > > <stdin>:1:1: error: operands are not valid for this GPU or mode > > v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7] > > ^ > > This is the only difference in AsmCaps.py with `derivedAsmCaps`, the > "HasWMMA" capability, for gfx1100. > > After manually specifying clang++ 18, I solved the problem (Tensile::FATAL: > Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered > here: > > > env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS Can confirm, I removed *-19 just in case and managed to proceed with the build. As for building it with version-19, the fix in the PR I pasted in the original post also worked for me.
commit 5dfd33faefc086e8c5b056f3591eec3c55642d5e Author: Patrick Lauer <patrick@gentoo.org> Date: Sat Nov 9 21:06:19 2024 +0000 sci-libs/rocBLAS: Restrict to llvm-18 Explodes violently with llvm-19 Signed-off-by: Patrick Lauer <patrick@gentoo.org>