940231 – sci-libs/rocBLAS (+rocm) fails on tensile command

Bug 940231 - sci-libs/rocBLAS (+rocm) fails on tensile command

Summary: sci-libs/rocBLAS (+rocm) fails on tensile command

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Gentoo Science Related Packages

URL:
Whiteboard:
Keywords:	PullRequest

Depends on:
Blocks:

Reported:	2024-09-25 08:04 UTC by Amit Ugol
Modified:	2024-11-10 06:23 UTC (History)
CC List:	8 users (show)

See Also:	https://github.com/gentoo/gentoo/pull/39179
Package list:
Runtime testing required:	---

Attachments
build log (build.log,8.92 KB, text/x-log) 2024-09-25 08:05 UTC, Amit Ugol	Details
sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the TENSILE_ROCM_ASSEMBLER (sci-libs:rocBLAS-6.1.1:20240928-182704.log,182.85 KB, application/octet-stream) 2024-09-28 19:16 UTC, Ryan Tsien	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Amit Ugol 2024-09-25 08:04:35 UTC

sci-libs/rocBLAS fails on tensile command failure with Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0).
Not sure if rocm related, but I have an AMD card so...




Reproducible: Always

Steps to Reproduce:
1. emerge sci-libs/rocBLAS with rocm
2.
3.
Actual Results:  
Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0).


There is an online issue [1] and a proposed fix [2] that allowed me to workaround this issue.
1 - https://github.com/ROCm/Tensile/issues/1757
2 - https://github.com/ROCm/Tensile/pull/1898/commits/4f7f6b6523b3575b4229e8713383166df0b121a0

I applied the same(-ish) logic in /usr/lib/python3.12/site-packages/Tensile/Common.py#2020 and it passed that issue.

Comment 1 Amit Ugol 2024-09-25 08:05:05 UTC

Created attachment 903727 [details]
build log

Comment 2 Amit Ugol 2024-09-25 08:08:30 UTC

Latest Tensile release already has a fix...

Comment 3 Ryan Tsien 2024-09-28 19:16:42 UTC

Created attachment 904115 [details]
sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the TENSILE_ROCM_ASSEMBLER

Hi there,

I'm having a similar problem, it's not clear if you're using Clang++ 18 or 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if there's a 19 then by default it should find the 19 first to use as its assembler.

If you are also using 19 by default, then this problem may have nothing to do with the bug you posted above. The problem I'm having here is that the following command works fine when using clang++ 18:

> echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 -

But when I use clang++ 19, I get the error

> <stdin>:1:1: error: operands are not valid for this GPU or mode
> v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]
> ^

This is the only difference in AsmCaps.py with `derivedAsmCaps`, the "HasWMMA" capability, for gfx1100.

After manually specifying clang++ 18, I solved the problem (Tensile::FATAL: Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered here:

> env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS

Comment 4 Amit Ugol 2024-09-29 16:38:11 UTC

(In reply to Ryan Tsien from comment #3)
> Created attachment 904115 [details]
> sci-libs/rocBLAS-6.1.1 build failure log with clang++ 19 as the
> TENSILE_ROCM_ASSEMBLER
> 
> Hi there,
> 
> I'm having a similar problem, it's not clear if you're using Clang++ 18 or
> 19 as the Assembler for Tensile, you can look under /usr/lib/llvm, if
> there's a 19 then by default it should find the 19 first to use as its
> assembler.
> 
> If you are also using 19 by default, then this problem may have nothing to
> do with the bug you posted above. The problem I'm having here is that the
> following command works fine when using clang++ 18:
> 
> > echo "v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]" | /usr/lib/llvm/18/bin/clang++ -x assembler -target amdgcn-amdhsa -mcpu=gfx1100 -mwavefrontsize64 -
> 
> But when I use clang++ 19, I get the error
> 
> > <stdin>:1:1: error: operands are not valid for this GPU or mode
> > v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]
> > ^
> 
> This is the only difference in AsmCaps.py with `derivedAsmCaps`, the
> "HasWMMA" capability, for gfx1100.
> 
> After manually specifying clang++ 18, I solved the problem (Tensile::FATAL:
> Cached asm caps differ from derived asm caps for (11, 0, 0)) I encountered
> here:
> 
> > env TENSILE_ROCM_ASSEMBLER_PATH=/usr/lib/llvm/18/bin/clang++ emerge -vj1 sci-libs/rocBLAS

Can confirm, I removed *-19 just in case and managed to proceed with the build. As for building it with version-19, the fix in the PR I pasted in the original post also worked for me.

Comment 5 Patrick Lauer gentoo-dev

2024-11-10 06:23:26 UTC

commit 5dfd33faefc086e8c5b056f3591eec3c55642d5e
Author: Patrick Lauer <patrick@gentoo.org>
Date:   Sat Nov 9 21:06:19 2024 +0000

    sci-libs/rocBLAS: Restrict to llvm-18
    
    Explodes violently with llvm-19
    
    Signed-off-by: Patrick Lauer <patrick@gentoo.org>