Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 928580 - sci-libs/caffe2-2.2.1-r1: nvcc fatal : Unsupported gpu architecture 'compute_35'
Summary: sci-libs/caffe2-2.2.1-r1: nvcc fatal : Unsupported gpu architecture 'comput...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Tupone Alfredo
URL:
Whiteboard:
Keywords:
: 928579 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-04-04 14:23 UTC by look
Modified: 2024-04-09 02:29 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
The build log. (build.log.gz,132.97 KB, application/gzip)
2024-04-04 14:23 UTC, look
Details
The emerge --info output (emerge-info-output.txt,19.61 KB, text/plain)
2024-04-04 14:24 UTC, look
Details
The build environment. (environment,152.31 KB, text/plain)
2024-04-04 14:24 UTC, look
Details
This is the build log after setting TORCH_CUDA_ARCH_LIST and using GCC12 (build-with-cuda-arch-and-gcc12.log.gz,129.65 KB, application/gzip)
2024-04-05 16:24 UTC, look
Details
The new (3rd) build.log. (build-log-3.log,27.97 KB, text/x-log)
2024-04-09 02:29 UTC, look
Details

Note You need to log in before you can comment on or make changes to this bug.
Description look 2024-04-04 14:23:01 UTC
Hi. I'm trying to build caffe2 for pytorch and it fails to build without a compilation error:

ninja: build stopped: subcommand failed.
 * ERROR: sci-libs/caffe2-2.2.1-r1::gentoo failed (compile phase):
 *   ninja -v -j32 -l32 failed
 * 
 * Call stack:
 *     ebuild.sh, line  136:  Called src_compile
 *   environment, line 3255:  Called cmake_src_compile
 *   environment, line 1330:  Called cmake_build
 *   environment, line 1297:  Called eninja
 *   environment, line 1878:  Called die
 * The specific snippet of code:
 *       "$@" || die -n "${*} failed"
Comment 1 look 2024-04-04 14:23:49 UTC
Created attachment 889403 [details]
The build log.
Comment 2 look 2024-04-04 14:24:00 UTC
Created attachment 889404 [details]
The emerge --info output
Comment 3 look 2024-04-04 14:24:09 UTC
Created attachment 889405 [details]
The build environment.
Comment 4 Paul Zander 2024-04-04 14:27:13 UTC
*** Bug 928579 has been marked as a duplicate of this bug. ***
Comment 5 Paul Zander 2024-04-04 14:34:53 UTC
> nvcc fatal   : Unsupported gpu architecture 'compute_35'

> WARNING: caffe2 is being built with its default CUDA compute capabilities: 3.5 and 7.0.
> These may not be optimal for your GPU.
> 
> To configure caffe2 with the CUDA compute capability that is optimal for your GPU,
> set TORCH_CUDA_ARCH_LIST in your make.conf, and re-emerge caffe2.
> For example, to use CUDA capability 7.5 & 3.5, add: TORCH_CUDA_ARCH_LIST=7.5 3.5
> For a Maxwell model GPU, an example value would be: TORCH_CUDA_ARCH_LIST=Maxwell
> 
> You can look up your GPU's CUDA compute capability at https://developer.nvidia.com/cuda-gpus
> or by running /opt/cuda/extras/demo_suite/deviceQuery | grep 'CUDA Capability'

Nevertheless nvidia-cuda-toolkit-12 only supports 5.0+.
Comment 6 look 2024-04-04 16:46:54 UTC
I added TORCH_CUDA_ARCH_LIST="6.1" and I still get the same error.
Comment 7 look 2024-04-04 22:30:47 UTC
I've tried switching CFLAGS to "march=native -O2 pipe" and using GCC12 & GCC13 and it fails in both cases even though arch=compute_61. I honestly don't have any other ideas and the error is not descriptive.
Comment 8 Paul Zander 2024-04-05 11:11:35 UTC
Can you add a the build.log for TORCH_CUDA_ARCH_LIST="6.1"?
Comment 9 look 2024-04-05 16:24:43 UTC
Created attachment 889534 [details]
This is the build log after setting TORCH_CUDA_ARCH_LIST and using GCC12

So I got a similar error in https://bugs.gentoo.org/928605 with media-libs/opencv and I was able to fix it by using GCC12 (as well as disabling the sandbox) based on this Reddit post: https://www.reddit.com/r/Gentoo/comments/1arlsfi/cuda_gcc_too_recent/. However, I have not been able to fix sci-libs/caffe2 with the same approach.
Comment 10 Paul Zander 2024-04-08 14:13:24 UTC
The CC="gcc-12" CXX="g++-12" is causing:

> /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: /usr/lib64/libprotobuf.so.23.3.0: undefined reference to `std::ios_base_library_init()@GLIBCXX_3.4.32'
> /usr/lib/gcc/x86_64-pc-linux-gnu/12/../../../../x86_64-pc-linux-gnu/bin/ld: /usr/lib64/libprotobuf.so.23.3.0: undefined reference to `__cxa_call_terminate@CXXABI_1.3.15'

Meaning protobuf was compiled with gcc-13.

You don't _ever_ need to set CC or CXX to make cuda work.

For most cuda related packages you only need to set the cuda host compiler and the arch.

See https://wiki.gentoo.org/wiki/User:Negril/CUDA.

For caffe2 you need to add for now:
> export TORCH_CUDA_ARCH_LIST="6.1"

Making the full env file:
> CUDA_VERBOSE="false"
> CUDAHOSTCXX="/usr/x86_64-pc-linux-gnu/gcc-bin/12"
> TORCH_CUDA_ARCH_LIST="6.1"
Comment 11 look 2024-04-09 02:29:55 UTC
Created attachment 889965 [details]
The new (3rd) build.log.

Hi. So I rebuilt world and now the error I'm getting when building caffe2 is:

CMake Error in torch/CMakeLists.txt:
  Imported target "pybind::pybind11" includes non-existent path

    "/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.



CMake Error in torch/CMakeLists.txt:
  Imported target "pybind::pybind11" includes non-existent path

    "/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.



CMake Error in torch/CMakeLists.txt:
  Imported target "pybind::pybind11" includes non-existent path

    "/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.



CMake Error in functorch/CMakeLists.txt:
  Imported target "pybind::pybind11" includes non-existent path

    "/include"

  in its INTERFACE_INCLUDE_DIRECTORIES.  Possible reasons include:

  * The path was deleted, renamed, or moved to another location.

  * An install or uninstall procedure did not complete successfully.

  * The installation package was faulty and references files it does not
  provide.