Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 817440 - [TRACKER] Tinderbox cannot test ROCm ebuilds
Summary: [TRACKER] Tinderbox cannot test ROCm ebuilds
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Agostino Sarubbo
URL:
Whiteboard:
Keywords:
: 872305 (view as bug list)
Depends on: 810619
Blocks: 810649 810700 817416 842144 842360 842363 892730 915856 817767
  Show dependency tree
 
Reported: 2021-10-11 01:57 UTC by Benda Xu
Modified: 2023-11-26 15:27 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Benda Xu gentoo-dev 2021-10-11 01:57:19 UTC
Tinderbox does not have the hardware for GPGPU.  The ROCm GPGPU ebuilds unconditionally fail.

Reproducible: Always
Comment 1 Benda Xu gentoo-dev 2021-10-11 01:58:45 UTC
Well it is the test cases that fail.
Comment 2 Benda Xu gentoo-dev 2021-10-11 02:00:17 UTC
*** Bug 817416 has been marked as a duplicate of this bug. ***
Comment 3 Yiyang Wu 2021-10-11 06:14:56 UTC
So the general test for rocm-4.3 math libraries (sci-libs/roc*-4.3, sci-libs/hip*-4.3, sci-libs/miopen-4.3) requirements are :

1. ROCm Supported GPU (See below)
2. Linux kernel >= 5.13 with AMDGPU enabled
3. Portage rw access to /dev/kfd

The Supported GPU architectures are: gfx803, gfx900, gfx906, gfx908, gfx90a, gfx1010, gfx1011, gfx1012, gfx1030. Product and architecture map can be seen on https://llvm.org/docs/AMDGPUUsage.html#processors
Comment 4 Agostino Sarubbo gentoo-dev 2021-10-11 07:16:42 UTC
(In reply to Benda Xu from comment #0)
> Tinderbox does not have the hardware for GPGPU.  The ROCm GPGPU ebuilds
> unconditionally fail.

Is there a way for the ebuild to die if the hw does not meet the requisites?
Comment 5 Yiyang Wu 2022-06-28 06:21:52 UTC
(In reply to Agostino Sarubbo from comment #4)
> (In reply to Benda Xu from comment #0)
> > Tinderbox does not have the hardware for GPGPU.  The ROCm GPGPU ebuilds
> > unconditionally fail.
> 
> Is there a way for the ebuild to die if the hw does not meet the requisites?

Yes, I'm writing rocm.eclass to implement hardware testing before test.

But I wonder, if src_test dies when hw does not meet the requirements, then tinderbox will also fail and alert? The current status is much alike -- the testing program throw error when no hw detected.

So shall we skip the test instead of dying if requested hw not present?
Comment 6 Larry the Git Cow gentoo-dev 2022-09-12 09:26:50 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=cf8a6a845b68b578772f2ae0d2703f203c6dec33

commit cf8a6a845b68b578772f2ae0d2703f203c6dec33
Author:     Yiyang Wu <xgreenlandforwyy@gmail.com>
AuthorDate: 2022-07-04 02:59:07 +0000
Commit:     Benda Xu <heroxbd@gentoo.org>
CommitDate: 2022-09-12 09:26:42 +0000

    rocm.eclass: new eclass
    
    This eclass provides utilities for ROCm libraries in
    https://github.com/ROCmSoftwarePlatform, e.g. rocBLAS, rocFFT.
    It contains a USE_EXPAND, amdgpu_targets_*, which handles the GPU
    architecture to compile, and keep targets coherent among dependencies.
    Packages that depend on ROCm libraries, like cupy, can also make use of
    this eclass, mainly specify GPU architecture and it's corresponding
    dependencies via USE_EXPAND.
    
    Closes: https://github.com/gentoo/gentoo/pull/26784
    Closes: https://bugs.gentoo.org/810619
    Bug: https://bugs.gentoo.org/817440
    Reference: https://archives.gentoo.org/gentoo-dev/message/49b17ca059187a4b5d983a9500507158
    Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
    Signed-off-by: Benda Xu <heroxbd@gentoo.org>

 eclass/rocm.eclass          | 223 ++++++++++++++++++++++++++++++++++++++++++++
 profiles/base/make.defaults |   2 +-
 2 files changed, 224 insertions(+), 1 deletion(-)
Comment 7 Benda Xu gentoo-dev 2022-09-12 09:28:52 UTC
Agostino, now ROCm packages will die early if there is no AMD GPU device available to run tests.  How do you think shall we move forward?
Comment 8 Benda Xu gentoo-dev 2022-09-22 14:42:43 UTC
*** Bug 872305 has been marked as a duplicate of this bug. ***