The current ebuild seems to rely on the default behavior of the ./configure script shipped with Tensorflow, which including a default value of 9.0 for CUDA version. And as a consequence, trying to install tensorflow with a CUDA version other than 9.0 would result in failure. Snippet of build log: * python2_7: running bazel_multibuild_wrapper do_configure WARNING: ignoring LD_PRELOAD in environment. Extracting Bazel installation... You have bazel 0.13.0- (@non-git) installed. Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /opt/cuda]: Invalid path to CUDA 9.0 toolkit. /opt/cuda/lib64/libcudart.so.9.0 cannot be found
It turns out that this line in the ebuild is also problematic: > export GCC_HOST_COMPILER_PATH=$(tc-getCC) Here is some explanation: tc-getCC returns the *name* of the C compiler (usually GCC), such as "x86_64-pc-linux-gnu-gcc". However, the configuration script of Tensorflow expects an absolute path of the C compiler. Hence it will complain that it cannot find the provided toolchain. I resolved this problem by sneaking a "which" in that line, but that is probably not so cross-compiler friendly or prefix friendly, as my vim syntax highlighting suggested that is an error. Maybe a better fix would be to sed the configuration script.
I can confirm the issue with CUDA 9.1, actually if I just built tensorflow from source, that CUDA 9.1 does work as expected. But I just don't know how to change the ebuild to make sure it can detect the installed CUDA version.
(In reply to Yi Yang from comment #1) > It turns out that this line in the ebuild is also problematic: > > > export GCC_HOST_COMPILER_PATH=$(tc-getCC) > > Here is some explanation: tc-getCC returns the *name* of the C compiler > (usually GCC), such as "x86_64-pc-linux-gnu-gcc". However, the configuration > script of Tensorflow expects an absolute path of the C compiler. Hence it > will complain that it cannot find the provided toolchain. > > I resolved this problem by sneaking a "which" in that line, but that is > probably not so cross-compiler friendly or prefix friendly, as my vim syntax > highlighting suggested that is an error. Maybe a better fix would be to sed > the configuration script. I changed this to =$(which $(tc-getCC)) in the 1.9_rc0 ebuild, does that one work any better? Its probably not correct in the long run but its worth a shot for now.
(In reply to younky.yang from comment #2) > I can confirm the issue with CUDA 9.1, actually if I just built tensorflow > from source, that CUDA 9.1 does work as expected. But I just don't know how > to change the ebuild to make sure it can detect the installed CUDA version. Can you show me the differences between the .tf_configure.bazelrc when you built from source that works and the one in /var/tmp/portage/sci-libs/tensorflow*/work/tensorflow*python3_6/.tf_configure.bazelrc ? Hopefully they're configured differently that we can add to the ebuild.
I just pushed sci-libs/tensorflow-1.9.0_rc1-r2 to the tree, can you try it out. I added stuff to set the cudnn and cuda versions properly now. If you need to set the cuda capabilities, you can set eg TF_CUDA_COMPUTE_CAPABILITIES="6.1" in your make.conf. I don't have a CUDA GPU yet so its still untested but it does build for me at least. I also added a system-libs USE-flag which unbundles a bunch of deps. It would be cool if you guys could test with that on too.
*** Bug 659462 has been marked as a duplicate of this bug. ***
(In reply to Jason Zaman from comment #5) > I just pushed sci-libs/tensorflow-1.9.0_rc1-r2 to the tree, can you try it > out. > > I added stuff to set the cudnn and cuda versions properly now. > If you need to set the cuda capabilities, you can set eg > TF_CUDA_COMPUTE_CAPABILITIES="6.1" > in your make.conf. I don't have a CUDA GPU yet so its still untested but it > does build for me at least. > > I also added a system-libs USE-flag which unbundles a bunch of deps. It > would be cool if you guys could test with that on too. Hi Jason, I've tried _rc1-r2, and it printed out a message like this: ">=dev-util/nvidia-cuda-toolkit-9.0[profiler] required by (sci-libs/tensorflow-1.9.0_rc1-r2:0/0::gentoo, ebuild scheduled for merge". But I can't emerge any >=nvidia-drivers-391.0, which is not capable for my gpu, and >=cuda-toolkit-9.0 depends on >=nvidia-drivers-391.0, or other dependencies like this. So the situation is that I still can't emerge tensorflow right now, and I still need youe help. Thanks!
(In reply to ZongyuZ from comment #7) > Hi Jason, I've tried _rc1-r2, and it printed out a message like this: > ">=dev-util/nvidia-cuda-toolkit-9.0[profiler] required by > (sci-libs/tensorflow-1.9.0_rc1-r2:0/0::gentoo, ebuild scheduled for merge". > But I can't emerge any >=nvidia-drivers-391.0, which is not capable for my > gpu, and >=cuda-toolkit-9.0 depends on >=nvidia-drivers-391.0, or other > dependencies like this. > So the situation is that I still can't emerge tensorflow right now, and I > still need youe help. Thanks! I am preparing a bump to _rc2 now and will release it soon after some more tests. If I lower the nvidia-cuda-toolkit version requirement to >=8 instead, will that work for you? The versions before _rc1-r2 didnt do the cuda setup properly so its not point trying to get them working. You're better off going up instead.
> tests. If I lower the nvidia-cuda-toolkit version requirement to >=8 > instead, will that work for you? I tried to modify the ebuild of tensorflow-...-rc2, and try to compile it, but it seems cuda-8.0 is not going to work with glibc-2.26. Here is a link: https://devtalk.nvidia.com/default/topic/1023776/-request-add-nvcc-compatibility-with-glibc-2-26/ So I think lowering the requirement won't work for me. And thank you for your help. I'll try to downgrade glibc later (or just give up cuda)...
tensorflow-1.9.0_rc2 should work with cuda now. I lowered the cuda requirement to >=8 too.