Tinderbox does not have the hardware for GPGPU. The ROCm GPGPU ebuilds unconditionally fail. Reproducible: Always
Well it is the test cases that fail.
*** Bug 817416 has been marked as a duplicate of this bug. ***
So the general test for rocm-4.3 math libraries (sci-libs/roc*-4.3, sci-libs/hip*-4.3, sci-libs/miopen-4.3) requirements are : 1. ROCm Supported GPU (See below) 2. Linux kernel >= 5.13 with AMDGPU enabled 3. Portage rw access to /dev/kfd The Supported GPU architectures are: gfx803, gfx900, gfx906, gfx908, gfx90a, gfx1010, gfx1011, gfx1012, gfx1030. Product and architecture map can be seen on https://llvm.org/docs/AMDGPUUsage.html#processors
(In reply to Benda Xu from comment #0) > Tinderbox does not have the hardware for GPGPU. The ROCm GPGPU ebuilds > unconditionally fail. Is there a way for the ebuild to die if the hw does not meet the requisites?
(In reply to Agostino Sarubbo from comment #4) > (In reply to Benda Xu from comment #0) > > Tinderbox does not have the hardware for GPGPU. The ROCm GPGPU ebuilds > > unconditionally fail. > > Is there a way for the ebuild to die if the hw does not meet the requisites? Yes, I'm writing rocm.eclass to implement hardware testing before test. But I wonder, if src_test dies when hw does not meet the requirements, then tinderbox will also fail and alert? The current status is much alike -- the testing program throw error when no hw detected. So shall we skip the test instead of dying if requested hw not present?
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=cf8a6a845b68b578772f2ae0d2703f203c6dec33 commit cf8a6a845b68b578772f2ae0d2703f203c6dec33 Author: Yiyang Wu <xgreenlandforwyy@gmail.com> AuthorDate: 2022-07-04 02:59:07 +0000 Commit: Benda Xu <heroxbd@gentoo.org> CommitDate: 2022-09-12 09:26:42 +0000 rocm.eclass: new eclass This eclass provides utilities for ROCm libraries in https://github.com/ROCmSoftwarePlatform, e.g. rocBLAS, rocFFT. It contains a USE_EXPAND, amdgpu_targets_*, which handles the GPU architecture to compile, and keep targets coherent among dependencies. Packages that depend on ROCm libraries, like cupy, can also make use of this eclass, mainly specify GPU architecture and it's corresponding dependencies via USE_EXPAND. Closes: https://github.com/gentoo/gentoo/pull/26784 Closes: https://bugs.gentoo.org/810619 Bug: https://bugs.gentoo.org/817440 Reference: https://archives.gentoo.org/gentoo-dev/message/49b17ca059187a4b5d983a9500507158 Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com> Signed-off-by: Benda Xu <heroxbd@gentoo.org> eclass/rocm.eclass | 223 ++++++++++++++++++++++++++++++++++++++++++++ profiles/base/make.defaults | 2 +- 2 files changed, 224 insertions(+), 1 deletion(-)
Agostino, now ROCm packages will die early if there is no AMD GPU device available to run tests. How do you think shall we move forward?
*** Bug 872305 has been marked as a duplicate of this bug. ***
(In reply to Benda Xu from comment #7) > Agostino, now ROCm packages will die early if there is no AMD GPU device > available to run tests. How do you think shall we move forward? Sorry for the late reply. I'm getting "/dev/kfd inaccessible" so I think we are fine. Feel free to close all bugs that depends on this.