Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 940231 - sci-libs/rocBLAS (+rocm) fails on tensile command
Summary: sci-libs/rocBLAS (+rocm) fails on tensile command
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Gentoo Science Related Packages
URL:
Whiteboard:
Keywords: PullRequest
Depends on:
Blocks:
 
Reported: 2024-09-25 08:04 UTC by Amit Ugol
Modified: 2024-11-10 06:23 UTC (History)
8 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build log (build.log,8.92 KB, text/x-log)
2024-09-25 08:05 UTC, Amit Ugol
Details
sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the TENSILE_ROCM_ASSEMBLER (sci-libs:rocBLAS-6.1.1:20240928-182704.log,182.85 KB, application/octet-stream)
2024-09-28 19:16 UTC, Ryan Tsien
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Amit Ugol 2024-09-25 08:04:35 UTC
sci-libs/rocBLAS fails on tensile command failure with Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0).
Not sure if rocm related, but I have an AMD card so...




Reproducible: Always

Steps to Reproduce:
1. emerge sci-libs/rocBLAS with rocm
2.
3.
Actual Results:  
Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0).


There is an online issue [1] and a proposed fix [2] that allowed me to workaround this issue.
1 - https://github.com/ROCm/Tensile/issues/1757
2 - https://github.com/ROCm/Tensile/pull/1898/commits/4f7f6b6523b3575b4229e8713383166df0b121a0

I applied the same(-ish) logic in /usr/lib/python3.12/site-packages/Tensile/Common.py#2020 and it passed that issue.
Comment 1 Amit Ugol 2024-09-25 08:05:05 UTC
Created attachment 903727 [details]
build log
Comment 2 Amit Ugol 2024-09-25 08:08:30 UTC
Latest Tensile release already has a fix...
Comment 3 Ryan Tsien 2024-09-28 19:16:42 UTC
Created attachment 904115 [details]
sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the TENSILE_ROCM_ASSEMBLER

Hi there,

I'm having a similar problem, it's not clear if you're using Clang++ 18 or 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if there's a 19 then by default it should find the 19 first to use as its assembler.

If you are also using 19 by default, then this problem may have nothing to do with the bug you posted above. The problem I'm having here is that the following command works fine when using clang++ 18:

> echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 -

But when I use clang++ 19, I get the error

> <stdin>:1:1: error: operands are not valid for this GPU or mode
> v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]
> ^

This is the only difference in AsmCaps.py with `derivedAsmCaps`, the "HasWMMA" capability, for gfx1100.

After manually specifying clang++ 18, I solved the problem (Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered here:

> env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS
Comment 4 Amit Ugol 2024-09-29 16:38:11 UTC
(In reply to Ryan Tsien from comment #3)
> Created attachment 904115 [details]
> sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the
> TENSILE_ROCM_ASSEMBLER
> 
> Hi there,
> 
> I'm having a similar problem, it's not clear if you're using Clang++ 18 or
> 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if
> there's a 19 then by default it should find the 19 first to use as its
> assembler.
> 
> If you are also using 19 by default, then this problem may have nothing to
> do with the bug you posted above. The problem I'm having here is that the
> following command works fine when using clang++ 18:
> 
> > echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 -
> 
> But when I use clang++ 19, I get the error
> 
> > <stdin>:1:1: error: operands are not valid for this GPU or mode
> > v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]
> > ^
> 
> This is the only difference in AsmCaps.py with `derivedAsmCaps`, the
> "HasWMMA" capability, for gfx1100.
> 
> After manually specifying clang++ 18, I solved the problem (Tensile::FATAL:
> Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered
> here:
> 
> > env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS

Can confirm, I removed *-19 just in case and managed to proceed with the build. As for building it with version-19, the fix in the PR I pasted in the original post also worked for me.
Comment 5 Patrick Lauer gentoo-dev 2024-11-10 06:23:26 UTC
commit 5dfd33faefc086e8c5b056f3591eec3c55642d5e
Author: Patrick Lauer <patrick@gentoo.org>
Date:   Sat Nov 9 21:06:19 2024 +0000

    sci-libs/rocBLAS: Restrict to llvm-18
    
    Explodes violently with llvm-19
    
    Signed-off-by: Patrick Lauer <patrick@gentoo.org>