I have an AMD Ryzen 7950X CPU and an AMD Radeon 6800XT GPU. The CPU has an APU mini-video card in it, which is an RDNA2 chipset not entirely dissimilar from the 6800XT. The output from rocm_agent_enumerator (from dev-util/rocminfo) shows 3 agents installed: gfx000, gfx1030, and gfx1036. The gfx000 is presumably just the CPU. The gfx1030 is the 6800XT. The gfx1036 is the APU. I want rocBLAS to run on my GPU. I have no particular need to run it on the APU. (it'd be nice) So I set AMDGPU_TARGETS="gfx1030" in make.conf to tell ROCm to target my GPU and ignore the APU. When I try to compile rocBLAS I get this error message: -- Check for working CXX compiler: /usr/bin/hipcc -- Check for working CXX compiler: /usr/bin/hipcc - broken CMake Error at /usr/share/cmake/Modules/CMakeTestCXXCompiler.cmake:62 (message): The C++ compiler "/usr/bin/hipcc" is not able to compile a simple test program. It fails with the following output: Change Dir: /var/tmp/portage/sci-libs/rocBLAS-5.1.3/work/rocBLAS-rocm-5.1.3_build/CMakeFiles/CMakeTmp Run Build Command(s):/usr/bin/ninja cmTC_a64ff && [1/2] Building CXX object CMakeFiles/cmTC_a64ff.dir/testCXXCompiler.cxx.o FAILED: CMakeFiles/cmTC_a64ff.dir/testCXXCompiler.cxx.o /usr/bin/hipcc -O2 -march=native -pipe -Wl,-O1 -Wl,--as-needed -o CMakeFiles/cmTC_a64ff.dir/testCXXCompiler.cxx.o -c /var/tmp/portage/sci-libs/rocBLAS-5.1.3/work/rocBLAS-rocm-5.1.3_build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx clang-14: warning: -Wl,-O1: 'linker' input unused [-Wunused-command-line-argument] clang-14: warning: -Wl,--as-needed: 'linker' input unused [-Wunused-command-line-argument] clang-14: error: invalid target ID 'gfx1036'; format is a processor name followed by an optional colon-delimited list of features followed by an enable/disable sign (e.g., 'gfx908:sramecc+:xnack-') ninja: build stopped: subcommand failed. Very well; gfx1036 support doesn't exist yet in llvm:14, but it is supported in llvm:15. I made a local copy of dev-util/hip, dev-util/Tensile, dev-libs/rocm-comgr, dev-libs/rocm-device-libs, dev-libs/rocr-runtime, and set the max llvm version to 15. I get a similar error message: Run Build Command(s):/usr/bin/ninja cmTC_f6a41 && [1/2] Building CXX object CMakeFiles/cmTC_f6a41.dir/testCXXCompiler.cxx.o FAILED: CMakeFiles/cmTC_f6a41.dir/testCXXCompiler.cxx.o /usr/bin/hipcc -O2 -march=native -pipe -Wl,-O1 -Wl,--as-needed -o CMakeFiles/cmTC_f6a41.dir/testCXXCompiler.cxx.o -c /var/tmp/portage/sci-libs/rocBLAS-5.1.3/work/rocBLAS-rocm-5.1.3_build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx clang-15: warning: -Wl,-O1: 'linker' input unused [-Wunused-command-line-argument] clang-15: warning: -Wl,--as-needed: 'linker' input unused [-Wunused-command-line-argument] clang-15: error: cannot find ROCm device library for gfx1036; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library ninja: build stopped: subcommand failed. CMake probably shouldn't be testing the compiler by building against a target that I won't be using. Alternatively, the ROCm stack could be bumped to support 5.3.1; gfx1036 support was added in 5.2.0. Reproducible: Always
Please attach the full build.log and emerge --info.
hipcc automatically detects all GPU archs and compiles. Your Zen4 CPU contains gfx1036 which is not covered in ROCm 5.1.3 releases. Although AMDGPU_TARGETS=gfx1030 is specified, this has no effect on CMake testing the CXX compiler, so hipcc fails at very first stage. Possible solution: 1. Upgrade ROCm toolchain 2. Find a way to insert --offload-arch=${AMDGPU_TARGETS} flag to CMake test CXX command
Created attachment 828203 [details] build log
I've since emerge -e world with LTO, -fomg-optimize etc. I'll set off a new one tonight with sane CFLAGS, re-confirm the issue, and post emerge --info in the morning. If the build log is meaningfully different I'll post a new copy of that, but I don't expect it to change. Honestly it would probably be easiest to just version bump ROCm. The portage tree is on 5.1.3 and upstream is 5.3.1.
Created attachment 828207 [details] emerge --info
Created attachment 828209 [details] build log
A possible solution is to cheat your system that only 6800XT is present. For hipcc, it uses rocm_agent_enumerator to detect GPU present. You can use a self specified target list to override its self detection: `echo gfx1030 > /tmp/amdgpu.list`, then `export ROCM_TARGET_LST=/tmp/amdgpu.list`. Then run hipcc or emerge hip based package, and see if cmake stops complaining about hipcc unable to compile.
Progress! With the ROCM_TARGET_LST trick it now fails with the following error message: FAILED: Tensile/lib/CMakeFiles/TensileHost.dir/source/hip/HipSolutionAdapter.cpp.o /usr/bin/hipcc -DTENSILE_DEFAULT_SERIALIZATION -DTENSILE_MSGPACK=1 -DTENSILE_USE_HIP -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -I/usr/share/Tensile/Source/lib/include -O2 -march=native -pipe -ftree-vectorize -D__HIP_HCC_COMPAT_MODE__=1 -fPIC -Wno-unused-command-line-argument -Wno-unused-result -x hip --hip-device-lib-path=/usr/lib/amdgcn/bitcode --offload-arch=gfx1030 -std=c++14 -MD -MT Tensile/lib/CMakeFiles/TensileHost.dir/source/hip/HipSolutionAdapter.cpp.o -MF Tensile/lib/CMakeFiles/TensileHost.dir/source/hip/HipSolutionAdapter.cpp.o.d -o Tensile/lib/CMakeFiles/TensileHost.dir/source/hip/HipSolutionAdapter.cpp.o -c /usr/share/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp In file included from /usr/share/Tensile/Source/lib/source/hip/HipSolutionAdapter.cpp:33: In file included from /usr/share/Tensile/Source/lib/include/Tensile/EmbeddedData.hpp:30: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/memory:77: In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/shared_ptr.h:53: /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/shared_ptr_base.h:196:22: error: use of undeclared identifier 'noinline'; did you mean 'inline'? __attribute__((__noinline__)) ^ /usr/include/hip/amd_detail/host_defines.h:50:37: note: expanded from macro '__noinline__' #define __noinline__ __attribute__((noinline)) ^
Created attachment 832201 [details] build.log with ROCM_TARGET_LST trick
(In reply to Nick Wallingford from comment #9) > Created attachment 832201 [details] > build.log with ROCM_TARGET_LST trick So the trick solves this bug. The failure is a duplication of https://bugs.gentoo.org/857126. Currently it has a fix pending in https://github.com/gentoo/gentoo/pull/28144
Good to hear. Of course, if ROCm were updated to 5.3.1, both of these would be unnecessary; 5.3.1 targets llvm-15, and 5.3.1 supports gfx1036.
Portage is now updated to 5.3.3, which fixes this issue.