Hi. I'm trying to build caffe2 for pytorch and it fails to build without a compilation error: ninja: build stopped: subcommand failed. * ERROR: sci-libs/caffe2-2.2.1-r1::gentoo failed (compile phase): * ninja -v -j32 -l32 failed * * Call stack: * ebuild.sh, line 136: Called src_compile * environment, line 3255: Called cmake_src_compile * environment, line 1330: Called cmake_build * environment, line 1297: Called eninja * environment, line 1878: Called die * The specific snippet of code: * "$@" || die -n "${*} failed"
Created attachment 889403 [details] The build log.
Created attachment 889404 [details] The emerge --info output
Created attachment 889405 [details] The build environment.
*** Bug 928579 has been marked as a duplicate of this bug. ***
> nvcc fatal : Unsupported gpu architecture 'compute_35' > WARNING: caffe2 is being built with its default CUDA compute capabilities: 3.5 and 7.0. > These may not be optimal for your GPU. > > To configure caffe2 with the CUDA compute capability that is optimal for your GPU, > set TORCH_CUDA_ARCH_LIST in your make.conf, and re-emerge caffe2. > For example, to use CUDA capability 7.5 & 3.5, add: TORCH_CUDA_ARCH_LIST=7.5 3.5 > For a Maxwell model GPU, an example value would be: TORCH_CUDA_ARCH_LIST=Maxwell > > You can look up your GPU's CUDA compute capability at https://developer.nvidia.com/cuda-gpus > or by running /opt/cuda/extras/demo_suite/deviceQuery | grep 'CUDA Capability' Nevertheless nvidia-cuda-toolkit-12 only supports 5.0+.
I added TORCH_CUDA_ARCH_LIST="6.1" and I still get the same error.
I've tried switching CFLAGS to "march=native -O2 pipe" and using GCC12 & GCC13 and it fails in both cases even though arch=compute_61. I honestly don't have any other ideas and the error is not descriptive.
Can you add a the build.log for TORCH_CUDA_ARCH_LIST="6.1"?
Created attachment 889534 [details] This is the build log after setting TORCH_CUDA_ARCH_LIST and using GCC12 So I got a similar error in https://bugs.gentoo.org/928605 with media-libs/opencv and I was able to fix it by using GCC12 (as well as disabling the sandbox) based on this Reddit post: https://www.reddit.com/r/Gentoo/comments/1arlsfi/cuda_gcc_too_recent/. However, I have not been able to fix sci-libs/caffe2 with the same approach.
The CC="gcc-12" CXX="g++-12" is causing: > /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: /usr/lib64/libprotobuf.so.23.3.0: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32' > /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: /usr/lib64/libprotobuf.so.23.3.0: undefined reference to `__cxa_call_terminate@CXXABI_1.3.15' Meaning protobuf was compiled with gcc-13. You don't _ever_ need to set CC or CXX to make cuda work. For most cuda related packages you only need to set the cuda host compiler and the arch. See https://wiki.gentoo.org/wiki/User:Negril/CUDA. For caffe2 you need to add for now: > export TORCH_CUDA_ARCH_LIST="6.1" Making the full env file: > CUDA_VERBOSE="false" > CUDAHOSTCXX="/usr/x86_64-pc-linux-gnu/gcc-bin/12" > TORCH_CUDA_ARCH_LIST="6.1"
Created attachment 889965 [details] The new (3rd) build.log. Hi. So I rebuilt world and now the error I'm getting when building caffe2 is: CMake Error in torch/CMakeLists.txt: Imported target "pybind::pybind11" includes non-existent path "/include" in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide. CMake Error in torch/CMakeLists.txt: Imported target "pybind::pybind11" includes non-existent path "/include" in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide. CMake Error in torch/CMakeLists.txt: Imported target "pybind::pybind11" includes non-existent path "/include" in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide. CMake Error in functorch/CMakeLists.txt: Imported target "pybind::pybind11" includes non-existent path "/include" in its INTERFACE_INCLUDE_DIRECTORIES. Possible reasons include: * The path was deleted, renamed, or moved to another location. * An install or uninstall procedure did not complete successfully. * The installation package was faulty and references files it does not provide.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=7e11aa2639192352d26804f1e45136343ea95844 commit 7e11aa2639192352d26804f1e45136343ea95844 Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2024-10-27 14:00:28 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2024-10-27 14:00:28 +0000 sci-libs/caffe2: drop 2.3.0-r3, 2.3.1 Closes: https://bugs.gentoo.org/942335 Closes: https://bugs.gentoo.org/928580 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> sci-libs/caffe2/Manifest | 2 - sci-libs/caffe2/caffe2-2.3.0-r3.ebuild | 294 --------------------------------- sci-libs/caffe2/caffe2-2.3.1.ebuild | 294 --------------------------------- sci-libs/caffe2/metadata.xml | 2 - 4 files changed, 592 deletions(-)