When compiling tensorflow with CUDA support, the user can set the CUDA compute capability level (see https://developer.nvidia.com/cuda-gpus for more info). It's analogous to gcc's --arch flag: GPU kernels compiled for lower CUDA compute capability will run on higher hardware, but less efficiently. Matching the tensorflow build to the CUDA compute level supported by the user's GPU is preferable, especially for compute-intensity applications like tensorflow. Tensorflow's ./configure allows the compute capability level to be set one of three ways: #1. Running /opt/cuda/extras/demo_suite/deviceQuery and parsing the output #2. Prompting the user during the ./configure script #3. Reading the environment variable TF_CUDA_COMPUTE_CAPABILITIES #1 fails because sandboxing doesn't allow external programs to run. #2 fails because the ebuild doesn't allow interaction with the ./configure script while it's running. #3 fails because even if the user exports TF_CUDA_COMPUTE_CAPABILITIES in the shell before emerging tensorflow, that environment variable isn't passed through to the configuration script. So tensorflow, when built on Gentoo, always falls back to the default CUDA compute level, which is to build GPU kernels for both 3.5 and 7.0 capability levels. This is bad for users who have GPUs with different capability levels because they get less efficient compute kernels, and it's also bad because building tensorflow takes significantly longer because it has to compile all the GPU compute kernels twice. This ebuild defect can be worked around by setting TF_CUDA_COMPUTE_CAPABILITIES via /etc/portage/env . ebuild hackers may be able to figure out a better way to handle this. But for now, I suggest displaying some kind of warning message at configure time if the TF_CUDA_COMPUTE_CAPABILITIES environment variable is not set, and brief instructions for setting it, so that users know that their build may be falling back to the wrong CUDA version(s). Reproducible: Always
Update: I was wrong about #3. emerge will find and use the TF_CUDA_COMPUTE_CAPABILITIES variable if it is set in the shell. So maybe the best way to fix this bug is to have the ebuild display a warning during the configure stage if this environment variable is not set. The warning could be something like: WARNING: Tensorflow is being built with its default CUDA compute capabilities: 3.5 and 7.0. These may not be optimal for your GPU. To configure Tensorflow with the CUDA compute capability that is optimal for your GPU, set the environment variable TF_CUDA_COMPUTE_CAPABILITIES and then re-emerge tensorflow. For example, to use CUDA capability 7.5, run: $ TF_CUDA_COMPUTE_CAPABILITIES=7.5 emerge sci-libs/tensorflow You can look up your GPU's CUDA compute capability at https://developer.nvidia.com/cuda-gpus or by running $ /opt/cuda/extras/demo_suite/deviceQuery | grep "CUDA Capability"
you can also just put TF_CUDA_COMPUTE_CAPABILITIES=7.5 in your make.conf, but yeah i should put a note in the ebuild.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=a3dc69074dcad86d4c95e024d231c90c62483152 commit a3dc69074dcad86d4c95e024d231c90c62483152 Author: Jason Zaman <perfinion@gentoo.org> AuthorDate: 2019-12-08 11:18:22 +0000 Commit: Jason Zaman <perfinion@gentoo.org> CommitDate: 2019-12-08 17:25:26 +0000 sci-libs/tensorflow: fix bazel, jsoncpp deps Also add a message about setting cuda compute capability Closes: https://bugs.gentoo.org/695428 Closes: https://bugs.gentoo.org/697864 Closes: https://bugs.gentoo.org/702222 Package-Manager: Portage-2.3.79, Repoman-2.3.16 Signed-off-by: Jason Zaman <perfinion@gentoo.org> sci-libs/tensorflow/Manifest | 1 + sci-libs/tensorflow/tensorflow-1.15.0_rc0.ebuild | 16 ++++++++++++++-- sci-libs/tensorflow/tensorflow-2.0.0.ebuild | 16 ++++++++++++++-- sci-libs/tensorflow/tensorflow-2.1.0_rc0.ebuild | 19 ++++++++++++++++--- 4 files changed, 45 insertions(+), 7 deletions(-)