Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 809392 - sys-devel/llvm-roc-4.3.0 fails to build rocBLAS
Summary: sys-devel/llvm-roc-4.3.0 fails to build rocBLAS
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Craig Andrews
URL:
Whiteboard:
Keywords: PullRequest
Depends on:
Blocks:
 
Reported: 2021-08-21 10:47 UTC by Yiyang Wu
Modified: 2021-08-26 12:39 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log (rocBLAS-4.3-buildfail.log,589.32 KB, text/plain)
2021-08-21 10:51 UTC, Yiyang Wu
Details
temp/environment (rocBLAS-4.3-environment,123.76 KB, text/plain)
2021-08-21 10:51 UTC, Yiyang Wu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yiyang Wu 2021-08-21 10:47:25 UTC
After bumping to ROCm version 4.3 (sys-devel/llvm-roc-4.3.0), the sci-libs/rocBLAS-4.3 (developing, waiting for merge) fails to build, with the following errors:

[45/276] /opt/gentoo/usr/lib/hip/bin/hipcc -DBUILD_WITH_TENSILE=1 -DROCBLAS_INTERNAL_API -DROCM_USE_FLOAT16 -DTENSILE_DEFAULT_SERIALIZATION -DTENSILE_MSGPACK=1 -DTENSILE_USE_HIP -DUSE_TENSILE_HOST -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drocblas
_EXPORTS -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/include -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/include/internal -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/include
-I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-4.3.0_build/include/internal -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas3/Tensile -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-4.3.0_build/include -I/tmp/portage/sc
i-libs/rocBLAS-4.3.0/work/rocBLAS-4.3.0_build/virtualenv/lib/python3.9/site-packages/Tensile/Source/lib/include  -march=native -mtune=native -O2 -pipe -D__HIP_HCC_COMPAT_MODE__=1 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-unused-command-lin
e-argument -mf16c -Werror=vla -xhip --hip-device-lib-path=/opt/gentoo/usr/lib/amdgcn/bitcode --offload-arch=gfx906:xnack- -std=c++14 -MD -MT library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o -MF library/src/CMakeFiles/rocblas
.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o.d -o library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o -c /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/rocblas_nrm2_strided_batched_ex.cpp
FAILED: library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o
/opt/gentoo/usr/lib/hip/bin/hipcc -DBUILD_WITH_TENSILE=1 -DROCBLAS_INTERNAL_API -DROCM_USE_FLOAT16 -DTENSILE_DEFAULT_SERIALIZATION -DTENSILE_MSGPACK=1 -DTENSILE_USE_HIP -DUSE_TENSILE_HOST -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drocblas_EXPORTS
-I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/include -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/include/internal -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/include -I/tmp/po
rtage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-4.3.0_build/include/internal -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas3/Tensile -I/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-4.3.0_build/include -I/tmp/portage/sci-libs/ro
cBLAS-4.3.0/work/rocBLAS-4.3.0_build/virtualenv/lib/python3.9/site-packages/Tensile/Source/lib/include  -march=native -mtune=native -O2 -pipe -D__HIP_HCC_COMPAT_MODE__=1 -fPIC -fvisibility=hidden -fvisibility-inlines-hidden -Wno-unused-command-line-argumen
t -mf16c -Werror=vla -xhip --hip-device-lib-path=/opt/gentoo/usr/lib/amdgcn/bitcode --offload-arch=gfx906:xnack- -std=c++14 -MD -MT library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o -MF library/src/CMakeFiles/rocblas.dir/blas
_ex/rocblas_nrm2_strided_batched_ex.cpp.o.d -o library/src/CMakeFiles/rocblas.dir/blas_ex/rocblas_nrm2_strided_batched_ex.cpp.o -c /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/rocblas_nrm2_strided_batched_ex.cpp
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/rocblas_nrm2_strided_batched_ex.cpp:5:
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_reduction_impl.hpp:11:
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_reduction_template.hpp:7:
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/fetch_template.hpp:34:17: error: reference to __host__ function 'norm<float>' in __host__ __device__ function
    return std::norm(A);
                ^
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_nrm2.hpp:15:17: note: called by 'operator()<float>'
        return {fetch_abs2(x)};
                ^
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/reduction_strided_batched.hpp:248:19: note: called by 'rocblas_reduction_strided_batched_kernel_part1<512, rocblas_fetch_nrm2<float>, rocblas_reduce_sum, const float *
const *, float>'
        tmp[tx] = FETCH{}(x[tid * incx], tid);
                  ^
/opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/g++-v11/complex:1870:5: note: 'norm<float>' declared here
    norm(_Tp __x)
    ^
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/rocblas_nrm2_strided_batched_ex.cpp:5:
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_reduction_impl.hpp:11:
In file included from /tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_reduction_template.hpp:7:
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/fetch_template.hpp:34:17: error: reference to __host__ function 'norm<double>' in __host__ __device__ function
    return std::norm(A);
                ^
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/rocblas_nrm2.hpp:15:17: note: called by 'operator()<double>'
        return {fetch_abs2(x)};
                ^
/tmp/portage/sci-libs/rocBLAS-4.3.0/work/rocBLAS-rocm-4.3.0/library/src/blas_ex/../blas1/reduction_strided_batched.hpp:248:19: note: called by 'rocblas_reduction_strided_batched_kernel_part1<512, rocblas_fetch_nrm2<double>, rocblas_reduce_sum, const double *const *, double>'
        tmp[tx] = FETCH{}(x[tid * incx], tid);
                  ^
/opt/gentoo/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/include/g++-v11/complex:1870:5: note: 'norm<double>' declared here
    norm(_Tp __x)
    ^
2 errors generated when compiling for gfx906.

And similar errors.

Reproducible: Always

Steps to Reproduce:
1. emerge '=sys-devel/llvm-rocv-4.3.0'
2. emerge '=sci-libs/rocBLAS-4.3.0'
3.



After investigation, it is the ROCM_PATH environment variable that causes clang to do strange things. Then hip_runtime.h and cuda wrappers for <complex> std library is included in a incorrect way, so clang found the reference for std::norm from gcc's <complex>, which is a __host__ only function without implementation on GPU devices. 

From 4.0.0 to 4.1.0 sys-devel/llvm-roc uses the llvm-roc-4.0.0-hip-location.patch which replaced the code (who uses $ROCM_PATH) for searching hip runtime with fixed hip installation location. But from 4.2.0 sys-devel/llvm-roc drop tihs patch.

Meanwhile, before hip-4.1.0-r1, ROCM_PATH is set in environmental files (env.d/99-hip), so even without llvm-roc-4.0.0-hip-location.patch things can works well. But from hip-4.1.0-r1 on ROCM_PATH is remomved by directly writing it to hipvars.pm.

So the two changes make llvm-roc-4.2 and llvm-roc-4.3 search the include dirs abnormally.

Restoring llvm-roc-4.0.0-hip-location.patch is strongly suggested.
Comment 1 Yiyang Wu 2021-08-21 10:51:19 UTC
Created attachment 734872 [details]
build.log

I choose asm_lite as Tensile library sets to compile, because the default one "asm_full" can take large amount of time and filling the build.log with tens of thousands of lines.
Comment 2 Yiyang Wu 2021-08-21 10:51:53 UTC
Created attachment 734875 [details]
temp/environment
Comment 3 Larry the Git Cow gentoo-dev 2021-08-26 12:39:31 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e78aa10a00b855cc9ab96fb36d1cebec991530ac

commit e78aa10a00b855cc9ab96fb36d1cebec991530ac
Author:     YiyangWu <xgreenlandforwyy@gmail.com>
AuthorDate: 2021-08-21 11:00:55 +0000
Commit:     Benda Xu <heroxbd@gentoo.org>
CommitDate: 2021-08-26 12:38:58 +0000

    sys-devel/llvm-roc: add hip-location.patch back
    
    Clang from llvm-roc-4.3.0 throws error during compilation of rocm
    packages for GPU devices (e.g. rocBLAS).  The missing of $ROCM_PATH
    and deprecation of hip-location.patch together causes in this
    situation.
    
    This commit update the hip-location.patch so it can be used again.
    
    Closes: https://bugs.gentoo.org/809392
    Closes: https://github.com/gentoo/gentoo/pull/22060
    Package-Manager: Portage-3.0.20, Repoman-3.0.3
    Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
    Signed-off-by: Benda Xu <heroxbd@gentoo.org>

 .../files/llvm-roc-4.3.0-hip-location.patch        | 189 +++++++++++++++++++++
 ...m-roc-4.3.0.ebuild => llvm-roc-4.3.0-r1.ebuild} |   1 +
 2 files changed, 190 insertions(+)