Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 851702 - sys-devel/llvm-roc deprecation, ROCm-5.1.3 using vanilla llvm/clang
Summary: sys-devel/llvm-roc deprecation, ROCm-5.1.3 using vanilla llvm/clang
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Craig Andrews
URL:
Whiteboard:
Keywords: PullRequest
Depends on:
Blocks: 693200
  Show dependency tree
 
Reported: 2022-06-13 12:40 UTC by Yiyang Wu
Modified: 2024-01-27 21:47 UTC (History)
10 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yiyang Wu 2022-06-13 12:40:11 UTC
sys-devel/llvm-roc is a simple installation of AMD patched llvm-project which does not fit into Gentoo's llvm slotting logic. That causes problems: when a program not only depends on hip but also (indirectly) links to (system) llvm, things broke (runtime error: Option xxx registered more than once!).

We need to redesign sys-devel/llvm-roc, probably making it into a slot of llvm, so programs won't link to two different llvm libraries.

This bug blocks https://bugs.gentoo.org/693200

Reproducible: Always
Comment 1 Yiyang Wu 2022-06-13 13:11:16 UTC
Currently media-gfx/blender-3.2.0 is deeply affected (https://bugs.gentoo.org/693200#c9). In the future there maybe more packages that both use llvm and ROCm.

As I see there are two ways:

1. Use existing llvm-14, maybe with patches picked from ROCm's llvm. This may cause maintenance overburden for following hip packages, since upstream llvm are not guaranteed to work.
2. Use ROCm's llvm, but make it another slot. Packages that both depend on hip and llvm must only use this rocm slot. This means that once user decide to install rocm packages like blender, they have to rebuild mesa with rocm slot.

I think Debian has provided useful information on using valinna llvm instead of ROCm patched llvm:
 - https://github.com/ROCm-Developer-Tools/HIP/issues/2449
 - https://lists.debian.org/debian-ai/2022/05/msg00000.html
 - https://lists.debian.org/debian-ai/2022/03/msg00035.html
 - https://lists.debian.org/debian-ai/2022/03/msg00011.html

But I don't see a clear picture whether they decide to use upstream llvm (hip, comgr are not made into experimental or sid yet).
Comment 2 Yiyang Wu 2022-06-13 13:18:26 UTC
> 1. Use existing llvm-14, maybe with patches picked from ROCm's llvm. This
> may cause maintenance overburden for following hip packages, since upstream
> llvm are not guaranteed to work.
> 2. Use ROCm's llvm, but make it another slot. Packages that both depend on
> hip and llvm must only use this rocm slot. This means that once user decide
> to install rocm packages like blender, they have to rebuild mesa with rocm
> slot.

Personally I prefer the second approach, because ROCm's llvm-project has yet many changes not upstreamed, especially OpenMP part (which may affect sci-libs/rocsparse, see https://github.com/gentoo/gentoo/pull/25318).
Comment 3 Yiyang Wu 2022-06-14 01:52:15 UTC
And Fedora is on their way to package ROCm with upstream llvm. They have packaged [rocm-comgr](https://src.fedoraproject.org/rpms/rocm-compilersupport)
Comment 4 Yiyang Wu 2022-06-14 09:46:33 UTC
Dear Michał, Górny

I am exploring the possibilities to drop sys-devel/llvm-roc and use standard llvm and clang as the backend of ROCm.

The first issue I encountered, is that Gentoo's llvm has `BUILD_SHARED_LIBS=OFF` and that cause components are built into libLLVM.so rather than being a standalone libLLVM<component>.so.

Without standalone components, dev-libs/rocm-device-libs fails to build:
```
ld: cannot find -lLLVMCore
ld: cannot find -lLLVMBitReader
ld: cannot find -lLLVMBitWriter
````

I tried to turn `BUILD_SHARED_LIBS=OFF` and `LLVM_LINK_LLVM_DYLIB=OFF` but get_distribution_components fails, since now there are lots of standalone components.

Of course I can patch rocm-device-libs so it will just link the libLLVM.so rather than link non-existing components, but that means the maintenance overburden of ROCm packages are increased. So I wonder: is there a reason Gentoo set `BUILD_SHARED_LIBS=OFF`?

Thanks!

Best regards,
Yiyang Wu
Comment 5 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2022-06-14 10:30:06 UTC
Upstream strongly discourages using BUILD_SHARED_LIBS, and recommends using the dylib instead.  The former is only meant to be used in specific development scenarios, mostly to reduce the cost of recompiling.

Both llvm-config and standard LLVM cmake macros should be perfectly happy with the dylib, and able to supply the right libraries when used correctly.  I don't know what ROCm does wrong but there's certainly a lot of other packages that get this right, so it must be fixable upstream.
Comment 6 Yiyang Wu 2022-06-14 10:41:53 UTC
(In reply to Michał Górny from comment #5)
> Upstream strongly discourages using BUILD_SHARED_LIBS, and recommends using
> the dylib instead.  The former is only meant to be used in specific
> development scenarios, mostly to reduce the cost of recompiling.
> 
> Both llvm-config and standard LLVM cmake macros should be perfectly happy
> with the dylib, and able to supply the right libraries when used correctly. 
> I don't know what ROCm does wrong but there's certainly a lot of other
> packages that get this right, so it must be fixable upstream.

OK I'll patch ROCm and consult the ROCm upstream.
Comment 7 Yiyang Wu 2022-06-28 06:15:49 UTC
(In reply to perestoronin from comment https://bugs.gentoo.org/693200#c29)
> I have got new error while try to compile sci-libs/miopen v5.1.3:
> 
> CMake Error at CMakeLists.txt:309 (find_library):
>   Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen
> 
> Can you fix this error ?

At first glance, I think just add a cmake configuration `-DMIOPEN_USE_MLIR=OFF` solve the issue. MIOpen-5.0.2 default turns this option off by default. While MIOpen-5.1.3, there is a complicated logic between the default value of each options -- BUILD_SHARED_LIBS default is ON, so MIOPEN_USE_MLIR_DEFAULT=ON.

If we want to use the LIBMLIRMIOPEN, we need to install the AMD modified mlir project in llvm (https://github.com/ROCmSoftwarePlatform/llvm-project-mlir, branched from llvm-project/mlir in early 2021).
Comment 8 perestoronin 2022-06-28 07:15:23 UTC
(In reply to Yiyang Wu from comment #7)
> At first glance, I think just add a cmake configuration
> `-DMIOPEN_USE_MLIR=OFF` solve the issue. MIOpen-5.0.2 default turns this
> option off by default. While MIOpen-5.1.3, there is a complicated logic
> between the default value of each options -- BUILD_SHARED_LIBS default is
> ON, so MIOPEN_USE_MLIR_DEFAULT=ON.
> 
> If we want to use the LIBMLIRMIOPEN, we need to install the AMD modified
> mlir project in llvm
> (https://github.com/ROCmSoftwarePlatform/llvm-project-mlir, branched from
> llvm-project/mlir in early 2021).

With `-DMIOPEN_USE_MLIR=OFF` got new error:

CMake Error at CMakeLists.txt:300 (message):
  extractkernel not found
Comment 9 Yiyang Wu 2022-06-28 07:49:35 UTC
(In reply to perestoronin from comment #8)
> With `-DMIOPEN_USE_MLIR=OFF` got new error:
> 
> CMake Error at CMakeLists.txt:300 (message):
>   extractkernel not found

That's because cmake cannot find clang-offload-bundler. Line 45 has to change to the correct path by calling $(get_llvm_prefix ${LLVM_MAX_SLOT}) provided by llvm.eclass. Also, you need to append two cxxflag `--rocm-path="${EPREFIX}"/usr` and `--hip-device-lib-path="${EPREFIX}"/usr/lib/amdgcn/bitcode` to compile.
Comment 10 Mike Lothian 2022-07-05 23:11:24 UTC
I was wondering how this effort was progressing and if it's been made any easier with 5.2.0?
Comment 11 Yiyang Wu 2022-07-06 03:08:34 UTC
(In reply to Mike Lothian from comment #10)
> I was wondering how this effort was progressing and if it's been made any
> easier with 5.2.0?

It is progressing. Actually the dev-util/hip-5.1.3 is done. The next step is the sci-libs.

5.2.0 makes it more difficult, actually -- llvm/clang-14 is not enough, ROCm-5.2.0 supports new architectures and ABI (code object) version but clang-14 lacks (we may have to wait for clang-15).

Is there any killing feature 5.2.0 compares to 5.1.3? If not, then I think 5.1.3 is a good enough version stand on llvm/clang-14
Comment 12 perestoronin 2022-07-16 12:43:50 UTC
I try to complie dev-libs/rccl-5.2.0, and I have got error:

ninja -v -j12 -l24
[1/55] /usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/collectives/device/functions.cpp
FAILED: CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o 
/usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/collectives/device/functions.cpp
/usr/lib/llvm/14/bin/clang-offload-bundler: error: '/var/tmp/portage/dev-libs/rccl-5.2.0/temp/functions-303183/functions-gfx900.bc': No such file or directory
clang-14: error: clang-offload-bundler command failed with exit code 1 (use -v to see invocation)

How to fix ?
Comment 13 perestoronin 2022-07-16 19:22:23 UTC
(In reply to perestoronin from comment #12)
> How to fix ?

fixed with patches wgetpaste rccl-namespace.patch rccl-nccl.patch 
https://gist.github.com/raw/838b5b8f28614a3c2202f30fc58aec26
Comment 14 Yiyang Wu 2022-07-17 01:43:02 UTC
(In reply to perestoronin from comment #13)
> (In reply to perestoronin from comment #12)
> > How to fix ?
> 
> fixed with patches wgetpaste rccl-namespace.patch rccl-nccl.patch 
> https://gist.github.com/raw/838b5b8f28614a3c2202f30fc58aec26

That's interesting. Can you explain a bit about this patch? And please post the build.log after this patch, and let's see why this error occured and got mitigated. I can't reproduce it on rccl-5.1.3
Comment 15 Yiyang Wu 2022-07-17 02:22:14 UTC
Also, while I'm packaging dev-libs/rccl-5.1.3 against rocm-5.1.3 (clang-14 based), I found a compilation error:

lld: error: ld-temp.o <inline asm>:1:26: specified hardware register is not supported on this GPU

when compiling for gfx1030 target.

After backporting https://reviews.llvm.org/D119939 to llvm, whis is resolved.
Comment 16 perestoronin 2022-07-17 04:04:56 UTC
(In reply to Yiyang Wu from comment #14)
> (In reply to perestoronin from comment #13)
> > (In reply to perestoronin from comment #12)
> > > How to fix ?
> > 
> > fixed with patches wgetpaste rccl-namespace.patch rccl-nccl.patch   
> > https://gist.github.com/raw/838b5b8f28614a3c2202f30fc58aec26
> 
> That's interesting. Can you explain a bit about this patch? And please post
> the build.log after this patch, and let's see why this error occured and got
> mitigated. I can't reproduce it on rccl-5.1.3

In rccl-5.1.3 built without patches.

rccl-nccl.patch - fix obsolete pthread_yield to sched_yield

rccl-namespace.patch - fix paths, namespace roc::rccl, and remove obsolete hcc, remove constant parallel jobs = 8, remove not supported hc-function-calls ...

wgetpaste build.log  https://gist.github.com/raw/834efe16d81f808fe0f61819a570ddf8
Comment 17 perestoronin 2022-07-17 04:18:43 UTC
(In reply to Yiyang Wu from comment #15)
> Also, while I'm packaging dev-libs/rccl-5.1.3 against rocm-5.1.3 (clang-14
> based), I found a compilation error:
> 
> lld: error: ld-temp.o <inline asm>:1:26: specified hardware register is not
> supported on this GPU
> 
> when compiling for gfx1030 target.
> 
> After backporting https://reviews.llvm.org/D119939 to llvm, whis is resolved.

Thanks, I have got only gfx900 AMD Radion Vega Frontier 16Gb, but recompile clang with this patch put patches to /etc/portage/patches/sys-devel/clang, also I applied other nessary patches from list:
wgetpaste 00-D69582.patch 01-D118949.patch 02-D119939.patch 03-D120557.patch 04-clang-declbase.patch 
https://gist.github.com/raw/326b80564355b686b965ff15331aca8c
Comment 18 perestoronin 2022-07-17 04:27:22 UTC
(In reply to perestoronin from comment #17)
> (In reply to Yiyang Wu from comment #15)
> > Also, while I'm packaging dev-libs/rccl-5.1.3 against rocm-5.1.3 (clang-14
> > based), I found a compilation error:
> > 
> > lld: error: ld-temp.o <inline asm>:1:26: specified hardware register is not
> > supported on this GPU
> > 
> > when compiling for gfx1030 target.
> > 
> > After backporting https://reviews.llvm.org/D119939 to llvm, whis is resolved.
> 
> Thanks, I have got only gfx900 AMD Radion Vega Frontier 16Gb, but recompile
> clang with this patch put patches to /etc/portage/patches/sys-devel/clang,
> also I applied other nessary patches from list:
> wgetpaste 00-D69582.patch 01-D118949.patch 02-D119939.patch 03-D120557.patch
> 04-clang-declbase.patch 
> https://gist.github.com/raw/326b80564355b686b965ff15331aca8c

relocate 2-D119939.patch to /etc/portage/patches/sys-devel/llvm
Comment 19 Yiyang Wu 2022-07-17 04:45:54 UTC
(In reply to perestoronin from comment #17)

> Thanks, I have got only gfx900 AMD Radion Vega Frontier 16Gb, but recompile
> clang with this patch put patches to /etc/portage/patches/sys-devel/clang,
> also I applied other nessary patches from list:
> wgetpaste 00-D69582.patch 01-D118949.patch 02-D119939.patch 03-D120557.patch
> 04-clang-declbase.patch 
> https://gist.github.com/raw/326b80564355b686b965ff15331aca8c

That's very helpful, solving the major obstacles of upgrading to rocm-5.2.0 against llvm/clang-14.0.6

If I understand correctly, these patches are meant to:

00-D69582.patch: support parallel jobs when compiling. ROCm packages suffers from long compilation time on some extra large source files. For example, Kernels.cpp for rocBLAS can take 10m to compile for a single GPU architecture, and for 6 arch that is 1h. But parallel jobs for clang is a quite controversial topic, because build system already utilize the multiprocessing features, and adding parallel jobs violates MAKEOPTS.

01-D118949.patch: this enables code-objects-v5, rocm-5.2 depends on this feature.

02-D119939.patch: that's what I previously mentioned, this patch fixes rccl compilation for RDNA2 cards.

03-D120557.patch: I think it's a fix. I have detected rocm-device-libs-5.2.0 runtime failure, and I guess this fix that (compatibility issue).

As we can see the ROCm-5.2 uses a lot of features in llvm/clang-15 main branch, so if we want to build the entire 5.2 stack upon llvm/clang-14 we need to backport these patches, and maybe even more. Currently I suggest we can stay on 5.1.3; to use rocm-5.2.0 you can emerge the clang-15.0.0.9999 to avoid patching llvm/clang heavily, I suppose.
Comment 20 Benda Xu gentoo-dev 2022-07-17 05:30:10 UTC
(In reply to perestoronin from comment #12)
> I try to complie dev-libs/rccl-5.2.0, and I have got error:
> 
> ninja -v -j12 -l24
> [1/55] /usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1
> -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/
> rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/
> device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC
> -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip
> --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900
> -std=c++14 -MD -MT
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c
> /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/
> collectives/device/functions.cpp
> FAILED: CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o 
> /usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1
> -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/
> rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives
> -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/
> device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC
> -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip
> --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900
> -std=c++14 -MD -MT
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o
> CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c
> /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/
> collectives/device/functions.cpp
> /usr/lib/llvm/14/bin/clang-offload-bundler: error:
> '/var/tmp/portage/dev-libs/rccl-5.2.0/temp/functions-303183/functions-gfx900.
> bc': No such file or directory
> clang-14: error: clang-offload-bundler command failed with exit code 1 (use
> -v to see invocation)
> 
> How to fix ?

Would minding opening a new bug for ROCm 5.2.0 packages?  ROCm is a fast moving target and we focus this issue on the vanilla clang.
Comment 21 perestoronin 2022-07-17 16:43:02 UTC
(In reply to Benda Xu from comment #20)
> (In reply to perestoronin from comment #12)
> > I try to complie dev-libs/rccl-5.2.0, and I have got error:
> > 
> > ninja -v -j12 -l24
> > [1/55] /usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1
> > -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/
> > rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/
> > device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC
> > -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip
> > --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900
> > -std=c++14 -MD -MT
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c
> > /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/
> > collectives/device/functions.cpp
> > FAILED: CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o 
> > /usr/bin/hipcc -DENABLE_COLLTRACE -D__HIP_PLATFORM_AMD__=1
> > -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/include/
> > rccl -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/include
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives
> > -I/var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0/src/collectives/
> > device -I//hsa/include -I//rocm_smi/include  -O2 -pipe -march=znver2 -fPIC
> > -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip
> > --hip-device-lib-path=/usr/lib64/amdgcn/bitcode --offload-arch=gfx900
> > -std=c++14 -MD -MT
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -MF
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o.d -o
> > CMakeFiles/rccl.dir/src/collectives/device/functions.cpp.o -c
> > /var/tmp/portage/dev-libs/rccl-5.2.0/work/rccl-rocm-5.2.0_build/src/
> > collectives/device/functions.cpp
> > /usr/lib/llvm/14/bin/clang-offload-bundler: error:
> > '/var/tmp/portage/dev-libs/rccl-5.2.0/temp/functions-303183/functions-gfx900.
> > bc': No such file or directory
> > clang-14: error: clang-offload-bundler command failed with exit code 1 (use
> > -v to see invocation)
> > 
> > How to fix ?
> 
> Would minding opening a new bug for ROCm 5.2.0 packages?  ROCm is a fast
> moving target and we focus this issue on the vanilla clang.

No bug, I use rocm 5.2.0 with old clang/llvm 14.0.6 and have got error and need patch clang and llvm, аnd rocm too, also gentoo not use standard paths and so on others releasons to have troubles. But if use clang/llvm 15 and compile and install artefacs to /opt/... , some patches become unnecessary, but if I want to compile tensorflow with rocm, I have got error abount "-march" with compile tensorflow with llvm-roc, so I to resolve  it need have only one llvm/clang on my computer to avoid circle of hell.
Comment 22 perestoronin 2022-07-17 16:48:32 UTC
(In reply to Yiyang Wu from comment #19)
> (In reply to perestoronin from comment #17)
> 
> > Thanks, I have got only gfx900 AMD Radion Vega Frontier 16Gb, but recompile
> > clang with this patch put patches to /etc/portage/patches/sys-devel/clang,
> > also I applied other nessary patches from list:
> > wgetpaste 00-D69582.patch 01-D118949.patch 02-D119939.patch 03-D120557.patch
> > 04-clang-declbase.patch 
> > https://gist.github.com/raw/326b80564355b686b965ff15331aca8c
> 
> That's very helpful, solving the major obstacles of upgrading to rocm-5.2.0
> against llvm/clang-14.0.6
> 
> If I understand correctly, these patches are meant to:
> 
> 00-D69582.patch: support parallel jobs when compiling. ROCm packages suffers
> from long compilation time on some extra large source files. For example,
> Kernels.cpp for rocBLAS can take 10m to compile for a single GPU
> architecture, and for 6 arch that is 1h. But parallel jobs for clang is a
> quite controversial topic, because build system already utilize the
> multiprocessing features, and adding parallel jobs violates MAKEOPTS.
> 
> 01-D118949.patch: this enables code-objects-v5, rocm-5.2 depends on this
> feature.
> 
> 02-D119939.patch: that's what I previously mentioned, this patch fixes rccl
> compilation for RDNA2 cards.
> 
> 03-D120557.patch: I think it's a fix. I have detected rocm-device-libs-5.2.0
> runtime failure, and I guess this fix that (compatibility issue).
> 
> As we can see the ROCm-5.2 uses a lot of features in llvm/clang-15 main
> branch, so if we want to build the entire 5.2 stack upon llvm/clang-14 we
> need to backport these patches, and maybe even more. Currently I suggest we
> can stay on 5.1.3; to use rocm-5.2.0 you can emerge the clang-15.0.0.9999 to
> avoid patching llvm/clang heavily, I suppose.

Yes, all rights.

PS Other new patches from llvm/clang 15+ I will apply to my system over clang/llvm 14.0.6 as nessary.
Comment 23 Mike Lothian 2022-07-27 08:29:03 UTC
I noticed llvm was updated with patches to make this easier, are we any closer to having 5.1 in tree? Is there an overlay with the development work?
Comment 24 Yiyang Wu 2022-07-27 08:45:24 UTC
(In reply to Mike Lothian from comment #23)
> I noticed llvm was updated with patches to make this easier, 

Oh, I haven't notice that. Can you give reference?

> we any closer to having 5.1 in tree? Is there an overlay with the development work?

We are close to 5.1 toolchain in tree. See https://github.com/gentoo/gentoo/pull/26441

There are only one bug remains: hip and rocm-comgr may breaks when upgrading clang (some paths are hard-coded, so hip and rocm-comgr need rebuild after updrade clang, but currently this is not automatically triggered; non-harded-coded method is in development).

As of math libraries, I have plenty on https://github.com/littlewu2508/gentoo/tree/rocm-5.1.3. They will be ready after https://github.com/gentoo/gentoo/pull/26441 get merged.
Comment 25 Mike Lothian 2022-07-27 09:55:54 UTC
This was the LLVM update it mentions ROCm: https://gitweb.gentoo.org/repo/gentoo.git/commit/sys-devel/llvm?id=c13c98b40beb7d18155a6a25ddfaf3d3ce6d81da

I'm just testing your ebuilds now, is there an updated rocm-opencl-runtime too?
Comment 26 Yiyang Wu 2022-07-27 10:13:56 UTC
(In reply to Mike Lothian from comment #25)
> This was the LLVM update it mentions ROCm:
> https://gitweb.gentoo.org/repo/gentoo.git/commit/sys-devel/
> llvm?id=c13c98b40beb7d18155a6a25ddfaf3d3ce6d81da

This patch is for dev-libs/rccl-5.1.3 

(In reply to Yiyang Wu from comment #15)
> Also, while I'm packaging dev-libs/rccl-5.1.3 against rocm-5.1.3 (clang-14
> based), I found a compilation error:
> 
> lld: error: ld-temp.o <inline asm>:1:26: specified hardware register is not
> supported on this GPU
> 
> when compiling for gfx1030 target.
> 
> After backporting https://reviews.llvm.org/D119939 to llvm, whis is resolved.


(In reply to Mike Lothian from comment #25)
> I'm just testing your ebuilds now, is there an updated rocm-opencl-runtime
> too?

I just committed it in https://github.com/littlewu2508/gentoo/commit/533ea5270ea9b8bbea88a38107314a93ab2fb755. src_test is still buggy (some tests needs DISPLAY but virtualx seems not working).
Comment 27 Mike Lothian 2022-07-27 10:37:34 UTC
Thanks, luxmark 3 works as long as I disable -cl-fast-relaxed-math luxmark 4 crashed with a llvm error:

mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

Thread 1 "luxmark.bin" received signal SIGABRT, Aborted.
0x00007ffff0e90aec in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff0e90aec in ?? () from /lib64/libc.so.6
#1  0x00007ffff0e3e772 in raise () from /lib64/libc.so.6
#2  0x00007ffff0e2846a in abort () from /lib64/libc.so.6
#3  0x00007fff93f36b8e in llvm::report_fatal_error(llvm::Twine const&, bool) () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#4  0x00007fff93f36a36 in llvm::report_fatal_error(char const*, bool) () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#5  0x00007fff93f165b2 in ?? () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#6  0x00007fff93f0390f in llvm::cl::Option::addArgument() () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#7  0x00007fffd4e5a087 in ?? () from /usr/lib64/libamd_comgr.so.2
#8  0x00007fffd4e13073 in ?? () from /usr/lib64/libamd_comgr.so.2
#9  0x00007ffff7fcbf6e in ?? () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7fcc05c in ?? () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff0f5a243 in _dl_catch_exception () from /lib64/libc.so.6
#12 0x00007ffff7fd344f in ?? () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff0f5a1ee in _dl_catch_exception () from /lib64/libc.so.6
#14 0x00007ffff7fd37f9 in ?? () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff0e8ab98 in ?? () from /lib64/libc.so.6
#16 0x00007ffff0f5a1ee in _dl_catch_exception () from /lib64/libc.so.6
#17 0x00007ffff0f5a2a8 in _dl_catch_error () from /lib64/libc.so.6
#18 0x00007ffff0e8a659 in ?? () from /lib64/libc.so.6
#19 0x00007ffff0e8ac50 in dlopen () from /lib64/libc.so.6
#20 0x00007fffd770bf6d in ?? () from /usr/lib64/libamdocl64.so
#21 0x00007fffd76f7659 in ?? () from /usr/lib64/libamdocl64.so
#22 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#23 0x00007fffd76ede58 in ?? () from /usr/lib64/libamdocl64.so
#24 0x00007fffd7796858 in ?? () from /usr/lib64/libamdocl64.so
#25 0x00007fffd7795c80 in ?? () from /usr/lib64/libamdocl64.so
#26 0x00007fffd76edafc in ?? () from /usr/lib64/libamdocl64.so
#27 0x00007fffd778c9df in ?? () from /usr/lib64/libamdocl64.so
#28 0x00007fffd7697caa in ?? () from /usr/lib64/libamdocl64.so
#29 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#30 0x00007fffd7697ba6 in clIcdGetPlatformIDsKHR () from /usr/lib64/libamdocl64.so
#31 0x00007ffff5756296 in ?? () from /usr/lib64/libOpenCL.so.1
#32 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#33 0x00007ffff575b312 in clGetPlatformIDs () from /usr/lib64/libOpenCL.so.1
#34 0x0000555555d08fe0 in cl::Platform::get(std::vector<cl::Platform, std::allocator<cl::Platform> >*) ()
#35 0x0000555555d051c0 in luxrays::Context::Context(void (*)(char const*), luxrays::Properties const&) ()
#36 0x00005555557d3aa6 in luxcore::GetOpenCLDeviceDescs() ()
#37 0x0000555555794032 in HardwareTreeModel::HardwareTreeModel(MainWindow*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#38 0x000055555579cf7d in LuxMarkApp::Init(LuxMarkAppMode, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, bool, bool) ()
#39 0x000055555572c145 in main ()
(gdb)
Comment 28 Yiyang Wu 2022-07-27 11:02:11 UTC
(In reply to Mike Lothian from comment #27)
> Thanks, luxmark 3 works as long as I disable -cl-fast-relaxed-math luxmark 4
> crashed with a llvm error:
> 
> mesa: CommandLine Error: Option 'h' registered more than once!
> LLVM ERROR: inconsistency in registered CommandLine options

These is commonly seen in multiple llvm version mixing together. Can you confirm only llvm-14 is installed?
Comment 29 Mike Lothian 2022-07-27 11:30:02 UTC
Yes, only one llvm 14.0.6

My guess would be llvm is already loaded with CommandLine options, then this comes along and either does the same ones, or incompatible ones
Comment 30 Mike Lothian 2022-07-27 11:35:49 UTC
Slightly better gdb output

mesa: CommandLine Error: Option 'h' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options

Thread 1 "luxmark.bin" received signal SIGABRT, Aborted.
0x00007ffff0e90aec in ?? () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff0e90aec in ?? () from /lib64/libc.so.6
#1  0x00007ffff0e3e772 in raise () from /lib64/libc.so.6
#2  0x00007ffff0e2846a in abort () from /lib64/libc.so.6
#3  0x00007fff97f36b8e in llvm::report_fatal_error(llvm::Twine const&, bool) () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#4  0x00007fff97f36a36 in llvm::report_fatal_error(char const*, bool) () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#5  0x00007fff97f165b2 in ?? () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#6  0x00007fff97f0390f in llvm::cl::Option::addArgument() () from /usr/lib/llvm/14/lib64/libLLVM-14.so
#7  0x00007fffb148533b in llvm::cl::alias::done (this=0x7fffb1e0be60 <SectionHeadersShorter>) at /usr/lib/llvm/14/include/llvm/Support/CommandLine.h:1910
#8  0x00007fffb14884bc in llvm::cl::alias::alias<char [2], llvm::cl::desc, llvm::cl::aliasopt> (this=0x7fffb1e0be60 <SectionHeadersShorter>) at /usr/lib/llvm/14/include/llvm/Support/CommandLine.h:1928
#9  0x00007fffb1481715 in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at /var/tmp/portage/dev-libs/rocm-comgr-5.1.3/work/ROCm-CompilerSupport-rocm-5.1.3/lib/comgr/src/comgr-objdump.cpp:180
#10 0x00007fffb148259d in _GLOBAL__sub_I_comgr_objdump.cpp(void) () at /var/tmp/portage/dev-libs/rocm-comgr-5.1.3/work/ROCm-CompilerSupport-rocm-5.1.3/lib/comgr/src/comgr-objdump.cpp:2440
#11 0x00007ffff7fcbf6e in ?? () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff7fcc05c in ?? () from /lib64/ld-linux-x86-64.so.2
#13 0x00007ffff0f5a243 in _dl_catch_exception () from /lib64/libc.so.6
#14 0x00007ffff7fd344f in ?? () from /lib64/ld-linux-x86-64.so.2
#15 0x00007ffff0f5a1ee in _dl_catch_exception () from /lib64/libc.so.6
#16 0x00007ffff7fd37f9 in ?? () from /lib64/ld-linux-x86-64.so.2
#17 0x00007ffff0e8ab98 in ?? () from /lib64/libc.so.6
#18 0x00007ffff0f5a1ee in _dl_catch_exception () from /lib64/libc.so.6
#19 0x00007ffff0f5a2a8 in _dl_catch_error () from /lib64/libc.so.6
#20 0x00007ffff0e8a659 in ?? () from /lib64/libc.so.6
#21 0x00007ffff0e8ac50 in dlopen () from /lib64/libc.so.6
#22 0x00007fffd450bf6d in ?? () from /usr/lib64/libamdocl64.so
#23 0x00007fffd44f7659 in ?? () from /usr/lib64/libamdocl64.so
#24 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#25 0x00007fffd44ede58 in ?? () from /usr/lib64/libamdocl64.so
#26 0x00007fffd4596858 in ?? () from /usr/lib64/libamdocl64.so
#27 0x00007fffd4595c80 in ?? () from /usr/lib64/libamdocl64.so
#28 0x00007fffd44edafc in ?? () from /usr/lib64/libamdocl64.so
#29 0x00007fffd458c9df in ?? () from /usr/lib64/libamdocl64.so
#30 0x00007fffd4497caa in ?? () from /usr/lib64/libamdocl64.so
#31 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#32 0x00007fffd4497ba6 in clIcdGetPlatformIDsKHR () from /usr/lib64/libamdocl64.so
#33 0x00007ffff5756296 in ?? () from /usr/lib64/libOpenCL.so.1
#34 0x00007ffff0e93d8a in ?? () from /lib64/libc.so.6
#35 0x00007ffff575b312 in clGetPlatformIDs () from /usr/lib64/libOpenCL.so.1
#36 0x0000555555d08fe0 in cl::Platform::get(std::vector<cl::Platform, std::allocator<cl::Platform> >*) ()
#37 0x0000555555d051c0 in luxrays::Context::Context(void (*)(char const*), luxrays::Properties const&) ()
#38 0x00005555557d3aa6 in luxcore::GetOpenCLDeviceDescs() ()
#39 0x0000555555794032 in HardwareTreeModel::HardwareTreeModel(MainWindow*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#40 0x000055555579cf7d in LuxMarkApp::Init(LuxMarkAppMode, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, bool, bool) ()
#41 0x000055555572c145 in main ()
(gdb) Quit
Comment 31 Mike Lothian 2022-08-01 13:07:36 UTC
So it is clashing with radeonsi's usage of llvm, foring softpipe allows the app to run just fine LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=softpipe
Comment 32 Larry the Git Cow gentoo-dev 2022-08-06 14:23:12 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82a6c2ca05ccf2dad8cbd75a813d6deafe4f105f

commit 82a6c2ca05ccf2dad8cbd75a813d6deafe4f105f
Author:     Yiyang Wu <xgreenlandforwyy@gmail.com>
AuthorDate: 2022-06-15 12:42:07 +0000
Commit:     Benda Xu <heroxbd@gentoo.org>
CommitDate: 2022-08-06 14:22:03 +0000

    dev-util/hip: add 5.1.3
    
    Switch from llvm-roc to vanilla clang --
    New variables about clang path in hipvars.pm
    hip-5.1.3-clang-include-path.patch to fix hipcc finding clang
    hip-5.1.3-rocm-path.patch: add compile flag to support unpatched clang
    Using sed cmd to fix clang header location in cmake
    
    Closes: https://bugs.gentoo.org/851702
    Reference: https://github.com/ROCm-Developer-Tools/hipamd/issues/18
    Reference: https://github.com/ROCm-Developer-Tools/hipamd/issues/27
    Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
    Signed-off-by: Benda Xu <heroxbd@gentoo.org>

 dev-util/hip/Manifest                              |   6 +
 ...0001-SWDEV-316128-HIP-surface-API-support.patch |  35 +++++
 .../hip/files/hip-5.1.3-clang-include-path.patch   |  12 ++
 .../hip/files/hip-5.1.3-fix-hip_prof_gen.patch     |  38 +++++
 dev-util/hip/files/hip-5.1.3-rocm-path.patch       |  13 ++
 dev-util/hip/files/hipvars-5.1.3.pm                |  21 +++
 dev-util/hip/hip-5.1.3.ebuild                      | 161 +++++++++++++++++++++
 7 files changed, 286 insertions(+)

Additionally, it has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=8a64d9b3fa74ab7ee3ec9b4d85f813d63648a130

commit 8a64d9b3fa74ab7ee3ec9b4d85f813d63648a130
Author:     Benda Xu <heroxbd@gentoo.org>
AuthorDate: 2022-08-06 13:47:56 +0000
Commit:     Benda Xu <heroxbd@gentoo.org>
CommitDate: 2022-08-06 14:22:32 +0000

    dev-util/rocm-clang-ocl: use system clang.
    
    Bug: https://bugs.gentoo.org/851702
    Package-Manager: Portage-3.0.30, Repoman-3.0.3
    Signed-off-by: Benda Xu <heroxbd@gentoo.org>

 .../files/rocm-clang-ocl-5.0.2-system-llvm.patch        | 17 +++++++++++++++++
 ...-ocl-5.0.2.ebuild => rocm-clang-ocl-5.0.2-r1.ebuild} |  9 +++++----
 2 files changed, 22 insertions(+), 4 deletions(-)
Comment 33 Benda Xu gentoo-dev 2022-08-06 14:25:26 UTC
(In reply to Mike Lothian from comment #31)
> So it is clashing with radeonsi's usage of llvm, foring softpipe allows the
> app to run just fine LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=softpipe

Thanks Mike for sharing your findings.  I have merged Yiyang's system vanilla llvm version of ROCm in tree as 5.1.3. Please check if the conflict between radeonsi and ROCm usage of llvm still exists and open a new bug for it if so.

Yours,
Benda
Comment 34 Paul Gover 2023-02-22 12:59:11 UTC
In an idle moment, I tried running "clinfo".  In return I got:
  mesa: CommandLine Error: Option 'h' registered more than once!
  LLVM ERROR: inconsistency in registered CommandLine options
  Aborted
and a search found this fixed bug.  However, as far as I can tell, I have only one LLVM installed, and everything relevant looks newer than the levels herein, specifically:
  equery list '*llvm*'
  [IP-] [  ] sys-devel/llvm-15.0.7:15/15
  [IP-] [  ] sys-devel/llvm-common-15.0.7:0
  [IP-] [  ] sys-devel/llvm-toolchain-symlinks-15-r1:15
  [IP-] [  ] sys-devel/llvmgold-15:0

  equery list mesa rocm-opencl-runtime
  [IP-] [  ] media-libs/mesa-22.2.5:0
  [IP-] [  ] dev-libs/rocm-opencl-runtime-5.3.3-r1:0/5.3

and no hip or rocm-clang-ocl.

I found a fix on the internet, but it doesn't work:
  LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=softpipe clinfo
produces the same results as above.  "rocminfo" works OK.

I've no idea if this is actually important; it just looks suspicious!
Comment 35 Yiyang Wu 2023-02-22 13:33:14 UTC
(In reply to Paul Gover from comment #34)
> In an idle moment, I tried running "clinfo".  In return I got:
>   mesa: CommandLine Error: Option 'h' registered more than once!
>   LLVM ERROR: inconsistency in registered CommandLine options
>   Aborted
> and a search found this fixed bug.  However, as far as I can tell, I have
> only one LLVM installed, and everything relevant looks newer than the levels
> herein, specifically:
>   equery list '*llvm*'
>   [IP-] [  ] sys-devel/llvm-15.0.7:15/15
>   [IP-] [  ] sys-devel/llvm-common-15.0.7:0
>   [IP-] [  ] sys-devel/llvm-toolchain-symlinks-15-r1:15
>   [IP-] [  ] sys-devel/llvmgold-15:0
> 
>   equery list mesa rocm-opencl-runtime
>   [IP-] [  ] media-libs/mesa-22.2.5:0
>   [IP-] [  ] dev-libs/rocm-opencl-runtime-5.3.3-r1:0/5.3
> 
> and no hip or rocm-clang-ocl.
> 
> I found a fix on the internet, but it doesn't work:
>   LIBGL_ALWAYS_SOFTWARE=1 GALLIUM_DRIVER=softpipe clinfo
> produces the same results as above.  "rocminfo" works OK.
> 
> I've no idea if this is actually important; it just looks suspicious!

It seems strange. Can you try to re-emerge rocm-comgr and rocm-opencl-runtime, and see if things resolved?
Comment 36 perestoronin 2023-02-23 17:34:04 UTC
(In reply to Paul Gover from comment #34)
> In an idle moment, I tried running "clinfo".  In return I got:
>   mesa: CommandLine Error: Option 'h' registered more than once!
>   LLVM ERROR: inconsistency in registered CommandLine options
>   Aborted
> and a search found this fixed bug.  However, as far as I can tell, I have
> only one LLVM installed, and everything relevant looks newer than the levels
> herein, specifically:
>   
> and no hip or rocm-clang-ocl.
> 
> I've no idea if this is actually important; it just looks suspicious!
>
>   equery list mesa rocm-opencl-runtime
>   [IP-] [  ] media-libs/mesa-22.2.5:0
>   [IP-] [  ] dev-libs/rocm-opencl-runtime-5.3.3-r1:0/5.3
> 
> and no hip or rocm-clang-ocl.
> 
> I've no idea if this is actually important; it just looks suspicious!

Stange, all work fine for me:

dev-util/hip-5.4.3 (/usr/bin/hipcc)
dev-libs/rocm-opencl-runtime-5.4.3 (/usr/bin/clinfo)
Comment 37 Yiyang Wu 2023-02-24 01:07:09 UTC
(In reply to Paul Gover from comment #34)
> In an idle moment, I tried running "clinfo".  In return I got:
>   mesa: CommandLine Error: Option 'h' registered more than once!
>   LLVM ERROR: inconsistency in registered CommandLine options
>   Aborted
> and a search found this fixed bug.  However, as far as I can tell, I have
> only one LLVM installed, and everything relevant looks newer than the levels
> herein, specifically:

Blender is having the same issue with ROCm-5.3.3 (using same clang). It is solved by backporting https://github.com/RadeonOpenCompute/ROCm-CompilerSupport/commit/2d05f9e480cbc591a6b888dfd49d9f7ef1bef25f

Maybe this can help, although we don't know why only you encounter this in clinfo.
Comment 38 Larry the Git Cow gentoo-dev 2023-03-07 07:57:15 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82a2720349d070fa86090fd9434bcfae75260a68

commit 82a2720349d070fa86090fd9434bcfae75260a68
Author:     Yiyang Wu <xgreenlandforwyy@gmail.com>
AuthorDate: 2023-03-01 02:54:09 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-03-07 07:56:59 +0000

    dev-libs/rocm-comgr: Fix comgr and mesa LLVM option collision
    
    >=dev-libs/rocm-comgr-5.3 and <=9999 needs backport a patch from
    upstream to avoid register -h command line option, which resolves
    conflicts with media-libs/mesa. Benefits media-gfx/blender.
    
    Bug: https://bugs.gentoo.org/851702
    Reference: https://github.com/gentoo/gentoo/pull/27552
    Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
    Closes: https://github.com/gentoo/gentoo/pull/29866
    Signed-off-by: Sam James <sam@gentoo.org>

 .../files/rocm-comgr-5.3.3-remove-h-option.patch   | 43 ++++++++++++++++++++++
 ...-5.3.3-r1.ebuild => rocm-comgr-5.3.3-r2.ebuild} |  1 +
 ...mgr-5.4.3.ebuild => rocm-comgr-5.4.3-r1.ebuild} |  1 +
 3 files changed, 45 insertions(+)