681806 – llvm-runtimes/openmp must be built with clang/clang++ for use cuda to be fully installed

Bug 681806 - llvm-runtimes/openmp must be built with clang/clang++ for use cuda to be fully installed

Summary: llvm-runtimes/openmp must be built with clang/clang++ for use cuda to be full...

Status:	UNCONFIRMED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal normal
Assignee:	Bernard Cafarelli

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-03-26 17:51 UTC by Robert
Modified:	2024-12-11 13:57 UTC (History)
CC List:	4 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Patch for libomp-9.0.0 that correctly handles cuda libraries (libomp-9.0.0.patch,1.16 KB, patch) 2019-12-03 14:51 UTC, Robert	Details \| Diff
Updated Patch for libomp-9.0.0 that correctly handles cuda libraries (libomp-9.0.0.patch,1.16 KB, patch) 2019-12-03 14:55 UTC, Robert	Details \| Diff
sys-libs/libomp-10.0.0_rc1-r1.ebuild (libomp-10.0.0_rc1-r1.ebuild,3.18 KB, text/plain) 2020-02-15 16:09 UTC, justXi	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Robert 2019-03-26 17:51:13 UTC

If one compiles sys-libs/libomp with the cuda use flag, you must also compile with the matching clang compiler in order to compile all the generally wanted files (specifically /usr/lib64/libomptarget-nvptx-sm_35.bc which provides the bit code for the builtin instructions on the gpu).  The ebuild should either emit a warning that instructs the user how to compile the package with clang using portage environments.

Additionally, the libomptarget-nvptx-sm_35.bc gets installed by default to a location not on what clang thinks is gentoo's library path.  Perhaps a warning could be installed for this too, or it could be put in on of the directories that it searches.

See also: https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html

Comment 1 Michał Górny archtester

2019-03-26 20:22:05 UTC

I don't have a problem forcing clang as that's what we do already for compiler-rt.  Do we only need with USE=cuda?  Could you expand a bit on the paths, maybe provide a patch?

Comment 2 Robert 2019-03-26 20:54:27 UTC

(In reply to Michał Górny from comment #1)
> I don't have a problem forcing clang as that's what we do already for
> compiler-rt.  Do we only need with USE=cuda?  Could you expand a bit on the
> paths, maybe provide a patch?

There may be other gpu devices that need equivalents of bc files, but the cuda one is the one I know about, and the one I need, and the only one I can test.

I could toss together a patch in a few weeks.  I'd need to do some digging to figure out where clang is searching for the device bitcode library.  How far back would you want the patches applied?  I think the feature was added in llvm7.x

Comment 3 Robert 2019-12-02 22:21:06 UTC

I finally figured out the path clang is searching for libomptarget.  It is looking in $(llvm-config --prefix)/lib64, so CMAKE_INSTALL_PREFIX needs to be set to $(llvm-config --prefix) for libomp.

I now have working a ebuild for cuda offloading with libomp.  I can make a patch in the next few days.

Comment 4 Robert 2019-12-03 14:51:03 UTC

Created attachment 598324 [details, diff]
Patch for libomp-9.0.0 that correctly handles cuda libraries

I've only made a patch for the version of libomp that I use.  Others should follow similarly.  Please let me know if you need help testing.

Comment 5 Robert 2019-12-03 14:55:32 UTC

Created attachment 598326 [details, diff]
Updated Patch for libomp-9.0.0 that correctly handles cuda libraries

Sorry the previous patch was inverted.

Comment 6 justXi 2019-12-21 14:05:12 UTC

Thanks for the patch. 
I had the same problem and the patch solved it.

Currently I use libomp-9.0.0 and nvidia-cuda-toolkit-10.1.243-r1. This CUDA version need GCC 8.x. Furthermore as far as I understand, CUDA 10.1.x is the latest supported version for llvm/clang 9.

Comment 7 justXi 2019-12-30 13:37:14 UTC

I think a currently full list of NVPTX_COMPUTE_CAPABILITIES (https://developer.nvidia.com/cuda-gpus) should include "35,50,52,60,61,70,75", if it is enabled by the CUDA use flag. Or what about a new USE_EXPAND like "NVPTX_TARGETS" with SM35, ..., SM75?

libomp-10.* seems to get support for AMDGCN, but currently this is experimental and would build only if the right tools are installed. 

https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt

There is LIBOMPTARGET_AMDGCN_GFXLIST initialized with "gfx700 gfx701 gfx801 gfx803 gfx900".

Comment 8 Robert 2019-12-30 18:16:17 UTC

justXi,  I think a new USE_EXPAND option is the right idea here.  However if we are going to do that, we should make some corresponding changes to sys-devel/clang to set the default ptx targets, specifically we should add a ebuild variable to to set `CLANG_OPENMP_NVPTX_DEFAULT_ARCH` in the cmake configuration.

Comment 9 justXi 2020-01-02 13:52:12 UTC

So we would get:

NVPTX_DEFAULT_TARGET=^^ ( SM35 ... SM75 )
affects "CLANG_OPENMP_NVPTX_DEFAULT_ARCH"

NVPTX_TARGETS=|| ( SM35 ... SM75) 
affects "LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES"

Comment 10 justXi 2020-02-15 16:09:58 UTC

Created attachment 613952 [details]
sys-libs/libomp-10.0.0_rc1-r1.ebuild