Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 953116 - sci-libs/rocBLAS-5.7.1-r2::gentoo failed Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0)
Summary: sci-libs/rocBLAS-5.7.1-r2::gentoo failed Tensile::FATAL: Cached asm caps diff...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Science Related Packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-04 11:32 UTC by Balint Dobai-Pataky
Modified: 2025-04-06 07:52 UTC (History)
5 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (emerge.info,9.37 KB, text/plain)
2025-04-04 11:32 UTC, Balint Dobai-Pataky
Details
sci-libs\:rocBLAS-5.7.1-r2\:20250404-112432.log (sci-libs:rocBLAS-5.7.1-r2:20250404-112432.log,9.36 KB, text/x-log)
2025-04-04 11:33 UTC, Balint Dobai-Pataky
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Balint Dobai-Pataky 2025-04-04 11:32:06 UTC
Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0)
CMake Error at /usr/lib64/cmake/Tensile/TensileConfig.cmake:277 (message):
  Error creating Tensile library: 255
Call Stack (most recent call first):
  library/src/CMakeLists.txt:89 (TensileCreateLibraryFiles)


Reproducible: Always
Comment 1 Balint Dobai-Pataky 2025-04-04 11:32:33 UTC
Created attachment 923565 [details]
emerge --info
Comment 2 Balint Dobai-Pataky 2025-04-04 11:33:13 UTC
Created attachment 923566 [details]
sci-libs\:rocBLAS-5.7.1-r2\:20250404-112432.log
Comment 3 Adam Wenocur 2025-04-06 03:23:19 UTC
From what I can tell, this is because the Clang 19 assembler is reporting that gfx803 is incapable of WMMA, while the cached value is that it supports WMMA. I don't know what changed to create this inconsistency, though. Looking at the rocWMMA package, it appears gfx803 was never supported in the first place; if this is true, patching the cached/expected value in dev-util/Tensile should fix the configuration.
Comment 4 Adam Wenocur 2025-04-06 03:54:32 UTC
I was looking in the wrong place to look up which architecture is mismatching: I build mine for gfx803, but the check is failing for gfx1100 and gfx1101 at least, which are indeed supported by rocWMMA.
Comment 5 Adam Wenocur 2025-04-06 07:12:12 UTC
Looking at the code sent to the assembler:
v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]


with the command line:

/usr/lib/llvm/19/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 -

the test fails with the output:
<stdin>:1:1: error: operands are not valid for this GPU or mode

Comparing this to the output for gfx803 or gfx900, which truly do not support WMMA:
<stdin>:1:1: error: instruction not supported on this GPU


It's evident that the test provided is malformed, and maybe the appropriate patch would assign operands for this test in dev-util/Tensile that are supported for that instruction across all microarchitectures. The content of this test can be found in Common.py within the Tensile package.
Comment 6 Adam Wenocur 2025-04-06 07:24:55 UTC
When I try to assemble the same test with Clang 18, it emits a binary, so the test is malformed only for Clang 19 (and later?)
Comment 7 Adam Wenocur 2025-04-06 07:52:19 UTC
Temporarily removing Clang 19 from my system while building rocBLAS works around the problem.

This offers an alternative: while patching the test in dev-util/Tensile may be best if an attempt at supporting a newer Clang is desired, fixing the build may just be a matter of forcing an older Clang to be used. Until then, the workaround is relatively low-effort, as long as there's a binpkg saved for the newer Clang, to restore it after installing rocBLAS.