Currently the blender ebuild's just find the latest installed when it should be possible to enable a specific llvm. Add a rocm use flag so that the llvm-roc / llvm:roc can be built against.
I do not have access to an AMD GPU and would be unable to develop this myself. However I would be happy to review a patch.
You don't need access to a AMD GPU to test it, you need access to a USE flag! This is a ebuild problem.
If a straight swap of sys-devel/llvm for sys-devel/llvm-roc is all that is required, it might be possible to add this feature by putting llvm? ( rocm? ( sys-devel/llvm-roc:= ) !rocm? ( sys-devel/llvm:= ) ) in RDEPEND and rocm? ( llvm opencl ) in REQUIRED_USE My concern is whether llvm-roc might used during rendering for creating the opencl kernel or for compiling the OSL shaders. Looking at the github page there are also a lot of rocm libraries and I don't know whether some of these might be required as well. Without hardware I can only test whether emerge blender is successful, not whether blender crashes during rendering. This should be developed and tested by someone with hardware so I can be sure it works prior to integration. The main advantage of the llvm eclass seems to be the ability to limit the maximum version of llvm to use when several are installed, however blender can use all versions of llvm from the tree. It seems that it is not possible to specify use of llvm-roc using it yet, pending bug #693198.
(In reply to Adrian from comment #3) > If a straight swap of sys-devel/llvm for sys-devel/llvm-roc is all that is > required, it might be possible to add this feature by putting > llvm? ( rocm? ( sys-devel/llvm-roc:= ) !rocm? ( sys-devel/llvm:= ) ) in > RDEPEND and rocm? ( llvm opencl ) in REQUIRED_USE > Sadly no, blender doesn't work with rocm and amdgpu. When activating cycles-useflag with rocm-opencl, the system-llvm and rocm-llvm will be mixed. The system-llvm is pulled in by mesa and interferes with the rocm-llvm for opencl. I've tried to get it work, but always get this error: mesa: CommandLine Error: Option 'help-list' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options I'm afraid, as long as rocm is independent from upstream llvm, this doesn't really work.
llvm upstream bugreports: https://bugs.llvm.org/show_bug.cgi?id=30587 https://bugs.llvm.org/show_bug.cgi?id=22952
I'd suggest to wait a bit until opencl and AMD somehow settles on anything anywhere. From my findings I have one machine working with opencl libs from amdgpu-pro drivers, sadly no luck on other machine - what I know for sure, it does not work with rocm or amdgpu(or mesa?) opencl. So even if you had a usable use flag, you'll probably end with not working opencl. I'm open to some testing or providing more details.. (having both AMD and nvidia and using Blender(from cg overlay) on daily basis.
(In reply to Martin Rott from comment #6) > I'd suggest to wait a bit until opencl and AMD somehow settles on anything > anywhere. I think it was pretty much decided it not to add this flag. TBH, it should be possible to use any OpenCL installed, but I'm not sure of how the icd is suposed to work with CL. > From my findings I have one machine working with opencl libs from amdgpu-pro > drivers, sadly no luck on other machine - what I know for sure, it does not > work with rocm or amdgpu(or mesa?) opencl. So even if you had a usable use > flag, you'll probably end with not working opencl. Yeah, it's an absolute joke tbf. Once I'm in a better position, I intend to learn AMD's assembly language and hope to help with the Clover stuff to get a full OpenCL implementation, if it's not done by then :/ They've dropped my hw completely from ROCm and I've no idea, since zero information, whether it would be possible to retrofit the work onto it as an external developer. > I'm open to some testing or providing more details.. (having both AMD and > nvidia and using Blender(from cg overlay) on daily basis.
Hello, I have made some efforts on rocm support for blender-3.2.0, which is officially supported by ROCm on Linux (although only RDNA cards are supported). My work is located at https://github.com/littlewu2508/gentoo/tree/blender-rocm, currently contains 3 commits: simple version bump on media-gfx/openvdb, blender-3.2.0.ebuild with rocm enabled, and The compilation is smooth, after
Hello, I have made some efforts on rocm support for blender-3.2.0, which is officially supported by ROCm on Linux (although only RDNA cards are supported). My work is located at https://github.com/littlewu2508/gentoo/tree/blender-rocm, currently contains 3 commits: simple version bump on media-gfx/openvdb, blender-3.2.0.ebuild with rocm enabled, and some nasty hacks to resolve multiple llvm instances caused by sys-devel/llvm-roc. The compilation is smooth (calling hipcc to compile cycle kernels to fatbin binaries is successful), but blender simply broke at runtime when trying to call the HIP cycles. Currently I'm blocked by : CommandLine Error: Option 'use-dbg-addr' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options with SIGABRT. This is a common issue when multiple versions of llvm are mixed. I managed to unmerge all llvm and clang and remains only sys-devel/llvm-roc for the HIP compiler. Then those common registered-more-than-once errors are gone, but still leaves another one: CommandLine Error: Option 'limited-coverage-experimental' registered more than once! By searching, I found this string is contained in /usr/lib/llvm/roc/lib/libclang-cpp.so.14roc and /usr/lib/llvm/roc/lib/libclangCodeGen.so.14roc so maybe that's why conflict exists. I brutally removed libclangCodeGen.so.14roc, and the error is gone but replaced by an invalid pointer bug. So the situation is, compilation of blender-rocm seems OK, but the sys-devel/llvm-roc brings another llvm that does not follow the standard gentoo llvm slotting rules, which breaks a lot. **Conclusion: a little progress on packaging blender-3.2 with rocm, compile seems OK; sys-devel/llvm-roc needs fixes to get things work. Stay tunned.**
You probably need to statically link in the special rocm llvm version to the HIP runtime. Otherwise it will crash when any program uses the system wide llvm version.
(In reply to Sebastian Parborg from comment #10) > You probably need to statically link in the special rocm llvm version to the > HIP runtime. > > Otherwise it will crash when any program uses the system wide llvm version. Does that mean, if one is linked to llvm:n, then all its dependencies has to link to llvm:n?
(In reply to Wu Yiyang from comment #11) > > Does that mean, if one is linked to llvm:n, then all its dependencies has to > link to llvm:n? Yes. If a program in dynamically linked to llvm version X and a library that program uses is dynamically linked to llvm version Y. It will crash because namespace collision. (The functions and namespaces are the same between llvm versions so the program will not know which dynamic library it should call). I've ran into this issue in the past when the Mesa drivers and some of Blenders dependencies are built with different llvm versions.
I have same error while try to compile tensorflow with rocm support as descibed in https://stackoverflow.com/questions/72510724/tensorflow-build-from-sources-with-frag-rocm-failed-with-error-tf-to-kernel-f valgrind ./tf_to_kernel ... ==3134537== Memcheck, a memory error detector ==3134537== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al. ==3134537== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info ==3134537== Command: ./tf_to_kernel ... ==3134537== : CommandLine Error: Option 'march' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options ==3134537== ==3134537== Process terminating with default action of signal 6 (SIGABRT): dumping core ==3134537== at 0x6E2A9EC: __pthread_kill_implementation (in /lib64/libc.so.6) ==3134537== by 0x6DDD7A1: raise (in /lib64/libc.so.6) ==3134537== by 0x6DC81E8: abort (in /lib64/libc.so.6) ==3134537== by 0x5CBFD05: llvm::report_fatal_error(llvm::Twine const&, bool) (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/libtensorflow_framework.so.2.9.1) ==3134537== by 0x5CBFE5A: llvm::report_fatal_error(char const*, bool) (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/libtensorflow_framework.so.2.9.1) ==3134537== by 0x5CA4A83: (anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, llvm::cl::SubCommand*) (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/libtensorflow_framework.so.2.9.1) ==3134537== by 0x5CA4DA1: llvm::cl::Option::addArgument() (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/libtensorflow_framework.so.2.9.1) ==3134537== by 0x250ED2: llvm::codegen::RegisterCodeGenFlags::RegisterCodeGenFlags() (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel) ==3134537== by 0x6DC91D2: __libc_start_main@@GLIBC_2.34 (in /lib64/libc.so.6) ==3134537== by 0x19A310: (below main) (in /var/tmp/portage/sci-libs/tensorflow-2.9.1-r3/work/tensorflow-2.9.1-bazel-base/execroot/org_tensorflow/bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel) ==3134537==
I packaged rocm-device-libs. roc-comgr and hip version 5.1.3 against gentoo default llvm-14. Now blender can detect RDNA2 cards and render using HIP cycles, although the [blender-3.2 demo](https://cloud.blender.org/p/gallery/629f23f908e12d4ff15241d3) is not render-demanding so I don't see a large GPU occupation. Running `rocm-smi --showpids` gives: ``` ======================= ROCm System Management Interface ======================= ================================ KFD Processes ================================= KFD process information: PID PROCESS NAME GPU(s) VRAM USED SDMA USED CU OCCUPANCY 2214542 blender-3.2 0 0 0 0 ================================================================================ ============================= End of ROCm SMI Log ============================== ``` I'll try to clean up my changes and push ROCm-5.1.3 components to gentoo in the following weeks. Changing to upstream llvm may introduce breaking changes to existing ROCm packages, so still a lot to do.
> Running `rocm-smi --showpids` gives: > > ``` > ======================= ROCm System Management Interface > ======================= > ================================ KFD Processes > ================================= > KFD process information: > PID PROCESS NAME GPU(s) VRAM USED SDMA USED CU OCCUPANCY > 2214542 blender-3.2 0 0 0 0 > ============================================================================= > === > ============================= End of ROCm SMI Log > ============================== > ``` OK that means It is not occupying GPU memories and run the hip kernel. Maybe I was not rendering anything -- When will blender use cycles to render?
> OK that means It is not occupying GPU memories and run the hip kernel. Maybe > I was not rendering anything -- When will blender use cycles to render? Oh I need to press F12 to render. Now the RX 6700 XT are running at full speed ``` ======================= ROCm System Management Interface ======================= ================================ KFD Processes ================================= KFD process information: PID PROCESS NAME GPU(s) VRAM USED SDMA USED CU OCCUPANCY 2525317 blender-3.2 1 2141466624 0 0 ================================================================================ ============================= End of ROCm SMI Log ============================== ```
On a Ryzen 5950X + Radeon RX 6700XT, I bumped blender to 3.2.0, and enable its hip cycles. It built and successfully rendered the blender 3.2 demo and 3.1 demo using HIP cycles on 6700XT. Some benchmarks: blender demo 3.1 https://cloud.blender.org/p/gallery/6220ae43b4a486f53171c89e: Rendering using Cycles: | pure CPU | HIP 6700XT | HIP 6700XT+5950X| | 3m16s | 1m54s,1m40s| 1m24s | The uncertainty may be large, but clearly shows blender-3.2 on Gentoo is capable of using HIP cycles on RDNA2 cards to render. I have uploaded the patch of blender-3.2.ebuild which enables HIP cycles
Created attachment 785432 [details, diff] Patch (diff between blender-3.1.2.ebuild and 3.2.0.ebuild) enabling rocm on blender
Awesome! Did you have to change much to make rocm compile with the vanilla llvm release? I thought that AMD had changed quite a bit in their llvm version and last time I checked they didn't provide any "disable special functionality so upstream llvm can be use" flag.
(In reply to Sebastian Parborg from comment #19) > Awesome! > > Did you have to change much to make rocm compile with the vanilla llvm > release? > I thought that AMD had changed quite a bit in their llvm version and last > time I checked they didn't provide any "disable special functionality so > upstream llvm can be use" flag. Speaking of hip, we don't have to change much, llvm/clang-14 just work out-of-box (actually Debian has been shipping rocm with beginning from clang-13)[1]. Patches are mainly for location issues, because AMD assume all components are in /opt/rocm. We install it under /usr, which result in passing '-isystem /usr/include' flag early to clang, causing wrong order of include dirs which fails `#include_next <math.h>`. Although I do observe test failures in test suites which is common among all distributions packaging ROCm against upstream llvm[2]. Luckily I don't observe blender run into those problems. You can find all my commits in https://github.com/littlewu2508/gentoo/tree/blender-rocm. First upgrade to clang-14.0.5-r1 (with a ROCm patch fixing include dir searches), and install/upgrade rocm-device-libs, rocm-comgr, hip to 5.1.3. Then emerge blender. I'll do some clean up and more tests, then land ROCm changes to ::gentoo in the following days.
And I think this may also be the ultimate solution to the previous discussion. ROCm provides the opencl, so in blender-2.x mixing of llvm-roc and llvm also happens. This will do the trick.
I have got error while try compile sci-libs/rocFFT or sci-libs/rocRAND with dev-util/hip v5.1.3: -- Configuring done CMake Error in library/src/CMakeLists.txt: Imported target "hip::device" includes non-existent path "HIP_CLANG_INCLUDE_PATH-NOTFOUND/.." in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide. How to fix this errors ?
(In reply to perestoronin from comment #22) > I have got error while try compile sci-libs/rocFFT or sci-libs/rocRAND with > dev-util/hip v5.1.3: > > -- Configuring done > CMake Error in library/src/CMakeLists.txt: > Imported target "hip::device" includes non-existent path > > "HIP_CLANG_INCLUDE_PATH-NOTFOUND/.." > > in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: > > * The path was deleted, renamed, or moved to another location. > > * An install or uninstall procedure did not complete successfully. > > * The installation package was faulty and references files it does not > provide. > > How to fix this errors ? Yes, I can reproduced that, too. I'm working on it. This is caused by .cmake files from dev-util/hip. Switching to vanilla clang means some directory changes compared to llvm-roc, so although I patched hipcc to work with vanilla clang, cmake modules are not working properly.
(In reply to perestoronin from comment #22) > I have got error while try compile sci-libs/rocFFT or sci-libs/rocRAND with > dev-util/hip v5.1.3: > .... > How to fix this errors ? Updates: I pushed some new commits into https://github.com/littlewu2508/gentoo/tree/blender-rocm, which should fix the problem. Now rocBLAS compiles and I suppose rocFFT and rocSPARSE as well. I also get rid of the patched clang (move hack to hip), so we don't have to depend on the sys-devel/clang-14.0.5-r1. As for blender, things works normally on RDNA2 cards. I backported https://developer.blender.org/D15242 to enable pre-RDNA devices, but the blender aborted when I try to render on Radeon VII: ``` Memory access fault by GPU node-1 (Agent handle: 0x557fab130c90) on address 0x7f6e6ffff000. Reason: Page not present or supervisor privilege. Nearby memory map: 0x7f6e70000000, 0xa306000, VRAM 0x7f6e8a000000, 0x960000, VRAM 0x7f6e8b000000, 0x960000, VRAM PtrInfo: Address: 0x7f6e70000000-0x7f6e7a306000/0x7f6e70000000-0x7f6e7a306000 Size: 0xa306000 Type: 1 Owner: 0x557fab130c90 CanAccess: 1 0x557fab130c90 In block: 0x7f6e70000000, 0xa400000 PtrInfo: Address: 0x7f6e8a000000-0x7f6e8a960000/0x7f6e8a000000-0x7f6e8a960000 Size: 0x960000 Type: 1 Owner: 0x557fab130c90 CanAccess: 1 0x557fab130c90 In block: 0x7f6e8a000000, 0xa00000 PtrInfo: Address: 0x7f6e8b000000-0x7f6e8b960000/0x7f6e8b000000-0x7f6e8b960000 Size: 0x960000 Type: 1 Owner: 0x557fab130c90 CanAccess: 1 0x557fab130c90 In block: 0x7f6e8b000000, 0xa00000 blender-3.2: /fast/portage/dev-libs/rocr-runtime-5.1.3/work/ROCR-Runtime-rocm-5.1.3/src/core/runtime/runtime.cpp:1276: static bool rocr::core::Runtime::VMFaultHandler(hsa_signal_value_t, void*): Assertion `false && "GPU memory access fault."' failed. ``` And https://developer.blender.org/D15242 says "This needs a newer HIP SDK", I guess maybe a new version of ROCm. So until then blender-3.2 HIP Cycles only works on RDNA cards. So I reverted that backport.
(In reply to Yiyang Wu from comment #24) > (In reply to perestoronin from comment #22) > > I have got error while try compile sci-libs/rocFFT or sci-libs/rocRAND with > > dev-util/hip v5.1.3: > > .... > > How to fix this errors ? > > Updates: > > I pushed some new commits into > https://github.com/littlewu2508/gentoo/tree/blender-rocm, which should fix > the problem. Now rocBLAS compiles and I suppose rocFFT and rocSPARSE as well. rocFFT соmpile too, аfter this patch --- a/library/src/include/twiddles.h +++ b/library/src/include/twiddles.h @@ -14,6 +14,7 @@ #include <numeric> #include <tuple> #include <vector> +#include <stdexcept> static const size_t LTWD_BASE_DEFAULT = 8; static const size_t LARGE_TWIDDLE_THRESHOLD = 4096; > As for blender, things works normally on RDNA2 cards. I backported > https://developer.blender.org/D15242 to enable pre-RDNA devices, but the > blender aborted when I try to render on Radeon VII: > > And https://developer.blender.org/D15242 says "This needs a newer HIP SDK", > I guess maybe a new version of ROCm. So until then blender-3.2 HIP Cycles > only works on RDNA cards. So I reverted that backport. No, old cards not supported by AMD in rocm, and on Vega Frontier GPU blender also segfault after attempted use HIP in cycles addon, but I want to find who can fix amdgpu kernel drivers to work fully with blender, rocm-smi, tensorflow https://gist.github.com/raw/0c06a9a8a38770b2cf18000ec4d18462
(In reply to perestoronin from comment #25) > > As for blender, things works normally on RDNA2 cards. I backported > > https://developer.blender.org/D15242 to enable pre-RDNA devices, but the > > blender aborted when I try to render on Radeon VII: > > > > And https://developer.blender.org/D15242 says "This needs a newer HIP SDK", > > I guess maybe a new version of ROCm. So until then blender-3.2 HIP Cycles > > only works on RDNA cards. So I reverted that backport. > > No, old cards not supported by AMD in rocm, and on Vega Frontier GPU blender > also segfault after attempted use HIP in cycles addon, but I want to find > who can fix amdgpu kernel drivers to work fully with blender, rocm-smi, > tensorflow https://gist.github.com/raw/0c06a9a8a38770b2cf18000ec4d18462 ERROR: 2 GPU[0]: % memory use: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. ERROR: 9 GPU[0]: od volt: The called function has not been implemented in this system for this device type ERROR: 2 GPU[0]: ras: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment. ERROR: 9 GPU[0]: od volt: The called function has not been implemented in this system for this device type ERROR: 9 GPU[0]: od volt: The called function has not been implemented in this system for this device type ERROR: 9 GPU[0]: od volt: The called function has not been implemented in this system for this device type ERROR: 9 GPU[0]: od volt: The called function has not been implemented in this system for this device type ERROR: 2 GPU[0]: % Energy Counter: RSMI_STATUS_NOT_SUPPORTED: This function is not supported in the current environment.
> No, old cards not supported by AMD in rocm, and on Vega Frontier GPU blender > also segfault after attempted use HIP in cycles addon, but I want to find > who can fix amdgpu kernel drivers to work fully with blender, rocm-smi, > tensorflow https://gist.github.com/raw/0c06a9a8a38770b2cf18000ec4d18462 I don't think it's a kernel issue, it's a blob issue.
(In reply to Luke A. Guest from comment #27) > > No, old cards not supported by AMD in rocm, and on Vega Frontier GPU blender > > also segfault after attempted use HIP in cycles addon, but I want to find > > who can fix amdgpu kernel drivers to work fully with blender, rocm-smi, > > tensorflow https://gist.github.com/raw/0c06a9a8a38770b2cf18000ec4d18462 > > > I don't think it's a kernel issue, it's a blob issue. You can find a reference to a hawaii_mec.bin.1a7 inside one of the many amd rocm issues lists, which I think explains it a bit more.
(In reply to Yiyang Wu from comment #24) > Updates: > > I pushed some new commits into > https://github.com/littlewu2508/gentoo/tree/blender-rocm, which should fix > the problem. Now rocBLAS compiles and I suppose rocFFT and rocSPARSE as well. I have got new error while try to compile sci-libs/miopen v5.1.3: CMake Error at CMakeLists.txt:309 (find_library): Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen Can you fix this error ? > And https://developer.blender.org/D15242 says "This needs a newer HIP SDK", > I guess maybe a new version of ROCm. So until then blender-3.2 HIP Cycles > only works on RDNA cards. So I reverted that backport. I asked share with me (perestoronin@gmail.com) about "new HIP SDK" from https://github.com/sayakbiswas via email sayak90@gmail.com but not responded. If you have "new HIP SDK" please share it with me.
(In reply to perestoronin from comment #29) > I have got new error while try to compile sci-libs/miopen v5.1.3: > > CMake Error at CMakeLists.txt:309 (find_library): > Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen > > Can you fix this error ? I'll have a look. But my main focus is to refine dev-util/hip-5.1.3 and land it to ::gentoo, and bump ROCm packages in sci-libs. Also it is not related to blender, so shall we discuss in https://bugs.gentoo.org/851702? > > I asked share with me (perestoronin@gmail.com) about "new HIP SDK" from > https://github.com/sayakbiswas via email sayak90@gmail.com but not responded. > If you have "new HIP SDK" please share it with me. I don't have personal releationship to him or blender developers, either. I think the new HIP SDK means the later releases of HIP. So I'll try making those -9999 ebuild work, and then we can keep up the latest progress of HIP.
(In reply to Yiyang Wu from comment #24) > Memory access fault by GPU node-1 (Agent handle: 0x557fab130c90) on address > 0x7f6e6ffff000. Reason: Page not present or supervisor privilege. > Nearby memory map: > 0x7f6e70000000, 0xa306000, VRAM > 0x7f6e8a000000, 0x960000, VRAM > 0x7f6e8b000000, 0x960000, VRAM With ROCm 5.2.0 released recently I'm still getting this error. > > And https://developer.blender.org/D15242 says "This needs a newer HIP SDK", > I guess maybe a new version of ROCm. So until then blender-3.2 HIP Cycles > only works on RDNA cards. So I reverted that backport. According to https://developer.blender.org/rBabfa09752f5c4d1fa2ae9df5e4ee0c9d77b50f3e, the required hip version is 5.2.21440, while the newest hip release is 5.2.21151 (see https://repo.radeon.com/rocm/apt/5.2/pool/main/h/hip-runtime-amd/), so I suppose we have to wait for the next patch release ROCm 5.2.1
(In reply to Yiyang Wu from comment #31) > According to > https://developer.blender.org/rBabfa09752f5c4d1fa2ae9df5e4ee0c9d77b50f3e, > the required hip version is 5.2.21440, while the newest hip release is > 5.2.21151 (see > https://repo.radeon.com/rocm/apt/5.2/pool/main/h/hip-runtime-amd/), so I > suppose we have to wait for the next patch release ROCm 5.2.1 I did a quick investigation on the version of hip. The version string (21151,21440) are determined in bin/hipvars.pm, variable $HIP_BASE_VERSION_PATCH [1]. The version string stay the same within minor release, so ROCm 5.2.x won't be the release that made blender work on Vega devices. There is not a single commit in HIP that introduce the patch version 21440. Should wait for ROCm 5.3 and see. [1] https://github.com/ROCm-Developer-Tools/HIP/blob/60b60f78e6b8ed3fb2e64388b5f27771a16673e8/bin/hipvars.pm#L30
(In reply to Yiyang Wu from comment #30) > (In reply to perestoronin from comment #29) > > I have got new error while try to compile sci-libs/miopen v5.1.3: > > > > CMake Error at CMakeLists.txt:309 (find_library): > > Could not find LIBMLIRMIOPEN using the following names: MLIRMIOpen > > > > Can you fix this error ? > > I'll have a look. But my main focus is to refine dev-util/hip-5.1.3 and land > it to ::gentoo, and bump ROCm packages in sci-libs. > > Also it is not related to blender, so shall we discuss in > https://bugs.gentoo.org/851702? > > > > > I asked share with me (perestoronin@gmail.com) about "new HIP SDK" from > > https://github.com/sayakbiswas via email sayak90@gmail.com but not responded. > > If you have "new HIP SDK" please share it with me. > > I don't have personal releationship to him or blender developers, either. I > think the new HIP SDK means the later releases of HIP. So I'll try making > those -9999 ebuild work, and then we can keep up the latest progress of HIP. Thanks! How test_all.sh from https://github.com/ROCm-Developer-Tools/HIP-Examples.git passed successful after add in top of test_all.sh next lines: export HIP_PATH="/usr" export HIP_PLATFORM="amd" And rocBLAS now compiled successful. I will try upgrade ebuild to rocm v5.2.0 and have got: -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed How to fix warnigns ? *------------------------------- ROCMChecks WARNING --------------------------* Options and properties should be set on a cmake target where possible. The variable 'CMAKE_CXX_FLAGS' may be set by the cmake toolchain, either by calling 'cmake -DCMAKE_CXX_FLAGS="-O2 -pipe -march=znver2 -Wno-unused-command-line-argument"' or set in a toolchain file and added with 'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>' CMake Warning at /usr/share/rocm/cmake/ROCMChecks.cmake:46 (message): 'CMAKE_CXX_FLAGS' is set at /var/tmp/portage/sci-libs/rocRAND-5.2.0/work/rocRAND-rocm-5.2.0/cmake/CMakeLists.txt And how to fix warnign ?
(In reply to perestoronin from comment #33) > And rocBLAS now compiled successful. I will try upgrade ebuild to rocm > v5.2.0 and have got: > > -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed > > How to fix warnigns ? > Currently clang does not support parallel jobs, I suppose. Maybe llvm/clang-15 will include that support, see https://reviews.llvm.org/D69582, maybe not. > *------------------------------- ROCMChecks WARNING > --------------------------* > Options and properties should be set on a cmake target where possible. The > variable 'CMAKE_CXX_FLAGS' may be set by the cmake toolchain, either by > calling 'cmake -DCMAKE_CXX_FLAGS="-O2 -pipe -march=znver2 > -Wno-unused-command-line-argument"' > or set in a toolchain file and added with > 'cmake -DCMAKE_TOOLCHAIN_FILE=<toolchain-file>' > > CMake Warning at /usr/share/rocm/cmake/ROCMChecks.cmake:46 (message): > 'CMAKE_CXX_FLAGS' is set at > > /var/tmp/portage/sci-libs/rocRAND-5.2.0/work/rocRAND-rocm-5.2.0/cmake/ > CMakeLists.txt > > And how to fix warnign ? rocRAND and hipRAND's CMakeLists.txt contains `set(CMAKE_CXX_FLAGS`, which triggered the warning. Simply remove these blocks and handle CXX_FLAGS by portage. We should also report that warning to upstream, and remind them that CMAKE_CXX_FLAGS should be set in toolchain file rather than CMakeLists.
(In reply to Yiyang Wu from comment #34) > (In reply to perestoronin from comment #33) > > And rocBLAS now compiled successful. I will try upgrade ebuild to rocm > > v5.2.0 and have got: > > > > -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Failed > > > > How to fix warnigns ? > > > > Currently clang does not support parallel jobs, I suppose. Maybe > llvm/clang-15 will include that support, see > https://reviews.llvm.org/D69582, maybe not. Thanks, аfter adopt patch D69582 to llvm-14.0.6 branch (new patch may be taken from https://gist.github.com/raw/8f79f3435e1a1f600ab5cd07d401b686): -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success > rocRAND and hipRAND's CMakeLists.txt contains `set(CMAKE_CXX_FLAGS`, which > triggered the warning. Simply remove these blocks and handle CXX_FLAGS by > portage. We should also report that warning to upstream, and remind them > that CMAKE_CXX_FLAGS should be set in toolchain file rather than CMakeLists. Thanks, done.
What is the current progress on resolving the fact that multiple LLVM versions get loaded into Blender when using HIP? This causes even the older Blender 2.x releases to trigger the llvm error when using rocm-opencl-runtime. From what I could gather, the solution is to get the ROCm stack to build with the systems LLVM version which will then match the other one that gets loaded?
(In reply to Rafael Ristovski from comment #36) > What is the current progress on resolving the fact that multiple LLVM > versions get loaded into Blender when using HIP? This causes even the older > Blender 2.x releases to trigger the llvm error when using > rocm-opencl-runtime. > > From what I could gather, the solution is to get the ROCm stack to build > with the systems LLVM version which will then match the other one that gets > loaded? ROCm-5.1.3 ebuilds in Gentoo are now built against system llvm/clang-14, so using them (>=5.1.3) would be safe.
(In reply to Yiyang Wu from comment #37) > (In reply to Rafael Ristovski from comment #36) > > What is the current progress on resolving the fact that multiple LLVM > > versions get loaded into Blender when using HIP? This causes even the older > > Blender 2.x releases to trigger the llvm error when using > > rocm-opencl-runtime. > > > > From what I could gather, the solution is to get the ROCm stack to build > > with the systems LLVM version which will then match the other one that gets > > loaded? > > ROCm-5.1.3 ebuilds in Gentoo are now built against system llvm/clang-14, so > using them (>=5.1.3) would be safe. Great! I'll try to get around to adding a HIP useflag to Blender soonish then. As a follow up question, do you know if the HIP versions in portage will be bumped to 5.3.0 soon? Seems like they fixed some RDNA1 issues: https://devtalk.blender.org/t/cycles-amd-hip-device-feedback/21400/419
> Great! I'll try to get around to adding a HIP useflag to Blender soonish > then. > Just proposed that in PR https://github.com/gentoo/gentoo/pull/27552 Please test it with RDNA2 cards. Months ago I succeeded (also mentioned in previous comments), but I don't test it on the new blender version. I tried blender-2.93.10 with opencl, but that did not work due to llvm symbol collision (although I left only one SLOT, there are still multiple symbols; no idea). > As a follow up question, do you know if the HIP versions in portage will be > bumped to 5.3.0 soon? Seems like they fixed some RDNA1 issues: > https://devtalk.blender.org/t/cycles-amd-hip-device-feedback/21400/419 ROCm-5.3 is not out. And it takes time for me to land it in Gentoo. If there are other developers willing to help maintaining ROCm ebuilds, it would be nice and fast.
(In reply to Yiyang Wu from comment #39) > Just proposed that in PR https://github.com/gentoo/gentoo/pull/27552 > Ok, lets continue the conversation there. > Please test it with RDNA2 cards. Months ago I succeeded (also mentioned in > previous comments), but I don't test it on the new blender version. > Hopefully I will have some time to test it next week, but I can't promise anything. > I tried blender-2.93.10 with opencl, but that did not work due to llvm > symbol collision (although I left only one SLOT, there are still multiple > symbols; no idea). > I think we can just ignore opencl at this point. When I tried it in the past it as very unstable and would lock up the computer frequently. If it did actually render, it would be slower than my CPU. So at least to me there isn't really any point in spending time on trying to get that to work. Lets just focus on HIP :) > ROCm-5.3 is not out. And it takes time for me to land it in Gentoo. If there > are other developers willing to help maintaining ROCm ebuilds, it would be > nice and fast. It was released 18 hours ago: https://github.com/ROCm-Developer-Tools/hipamd/releases/tag/rocm-5.3.0 So it could be something that we could work towards.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=175d65e779e69e5702ca52cb3af973a2fa0b0e62 commit 175d65e779e69e5702ca52cb3af973a2fa0b0e62 Author: Paul Zander <negril.nx+gentoo@gmail.com> AuthorDate: 2024-03-28 22:08:25 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2024-04-21 12:50:05 +0000 media-gfx/blender: add 4.0.2-r1, cleanup hopefully fixed osl build re-added hip flag in 4.0.2-r1 hide test code in release versions Bug: https://bugs.gentoo.org/693200 Closes: https://bugs.gentoo.org/925534 Closes: https://bugs.gentoo.org/927281 Closes: https://bugs.gentoo.org/927715 Closes: https://bugs.gentoo.org/927835 Closes: https://bugs.gentoo.org/927931 Signed-off-by: Paul Zander <negril.nx+gentoo@gmail.com> Closes: https://github.com/gentoo/gentoo/pull/35973 Signed-off-by: Sam James <sam@gentoo.org> media-gfx/blender/blender-3.3.15.ebuild | 4 +- media-gfx/blender/blender-3.3.8.ebuild | 4 +- media-gfx/blender/blender-3.6.8.ebuild | 4 +- ...lender-4.0.2.ebuild => blender-4.0.2-r1.ebuild} | 128 +++++--- media-gfx/blender/blender-9999.ebuild | 119 ++++--- .../blender/files/blender-4.0.1-openvdb-11.patch | 2 + .../files/blender-4.0.2-CUDA_NVCC_FLAGS.patch | 14 + .../blender/files/blender-4.0.2-FindClang.patch | 14 + .../blender/files/blender-4.0.2-r1-osl-1.13.patch | 342 +++++++++++++++++++++ profiles/arch/amd64/package.use.mask | 4 + profiles/arch/base/package.use.mask | 4 + 11 files changed, 556 insertions(+), 83 deletions(-)