If one compiles sys-libs/libomp with the cuda use flag, you must also compile with the matching clang compiler in order to compile all the generally wanted files (specifically /usr/lib64/libomptarget-nvptx-sm_35.bc which provides the bit code for the builtin instructions on the gpu). The ebuild should either emit a warning that instructs the user how to compile the package with clang using portage environments. Additionally, the libomptarget-nvptx-sm_35.bc gets installed by default to a location not on what clang thinks is gentoo's library path. Perhaps a warning could be installed for this too, or it could be put in on of the directories that it searches. See also: https://www.hahnjo.de/blog/2018/10/08/clang-7.0-openmp-offloading-nvidia.html
I don't have a problem forcing clang as that's what we do already for compiler-rt. Do we only need with USE=cuda? Could you expand a bit on the paths, maybe provide a patch?
(In reply to Michał Górny from comment #1) > I don't have a problem forcing clang as that's what we do already for > compiler-rt. Do we only need with USE=cuda? Could you expand a bit on the > paths, maybe provide a patch? There may be other gpu devices that need equivalents of bc files, but the cuda one is the one I know about, and the one I need, and the only one I can test. I could toss together a patch in a few weeks. I'd need to do some digging to figure out where clang is searching for the device bitcode library. How far back would you want the patches applied? I think the feature was added in llvm7.x
I finally figured out the path clang is searching for libomptarget. It is looking in $(llvm-config --prefix)/lib64, so CMAKE_INSTALL_PREFIX needs to be set to $(llvm-config --prefix) for libomp. I now have working a ebuild for cuda offloading with libomp. I can make a patch in the next few days.
Created attachment 598324 [details, diff] Patch for libomp-9.0.0 that correctly handles cuda libraries I've only made a patch for the version of libomp that I use. Others should follow similarly. Please let me know if you need help testing.
Created attachment 598326 [details, diff] Updated Patch for libomp-9.0.0 that correctly handles cuda libraries Sorry the previous patch was inverted.
Thanks for the patch. I had the same problem and the patch solved it. Currently I use libomp-9.0.0 and nvidia-cuda-toolkit-10.1.243-r1. This CUDA version need GCC 8.x. Furthermore as far as I understand, CUDA 10.1.x is the latest supported version for llvm/clang 9.
I think a currently full list of NVPTX_COMPUTE_CAPABILITIES (https://developer.nvidia.com/cuda-gpus) should include "35,50,52,60,61,70,75", if it is enabled by the CUDA use flag. Or what about a new USE_EXPAND like "NVPTX_TARGETS" with SM35, ..., SM75? libomp-10.* seems to get support for AMDGCN, but currently this is experimental and would build only if the right tools are installed. https://github.com/llvm/llvm-project/blob/master/openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt There is LIBOMPTARGET_AMDGCN_GFXLIST initialized with "gfx700 gfx701 gfx801 gfx803 gfx900".
justXi, I think a new USE_EXPAND option is the right idea here. However if we are going to do that, we should make some corresponding changes to sys-devel/clang to set the default ptx targets, specifically we should add a ebuild variable to to set `CLANG_OPENMP_NVPTX_DEFAULT_ARCH` in the cmake configuration.
So we would get: NVPTX_DEFAULT_TARGET=^^ ( SM35 ... SM75 ) affects "CLANG_OPENMP_NVPTX_DEFAULT_ARCH" NVPTX_TARGETS=|| ( SM35 ... SM75) affects "LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES"
Created attachment 613952 [details] sys-libs/libomp-10.0.0_rc1-r1.ebuild