1. I copy protobuf_temp_fix_cuda10.1.patch to /etc/portage/patches/dev-libs/protobuf-3.6.1.3/ and rebuild dev-libs/protobuf. 2. make links libcublas.so.10.1.0.105 -> libcublas.so.10.1 libcufft.so.10.1.105 -> libcufft.so.10.1 libcurand.so.10.1.105 -> libcurand.so.10.1 libcusolver.so.10.1.105 -> libcusolver.so.10.1 3. tensorflow build succeesful with nvidia-cuda-toolkit-10.1.105 and cudnn-7.5.0.56 4. ln -s /opt/cuda/lib64/libcublas.so.10.1.0.105 /usr/lib64/libcublas.so.10.1 5. python cifar10.py (simple tensorflow.python.keras model) works fine: 2019-04-06 14:22:40.053167: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-04-06 14:22:40.053719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.815 pciBusID: 0000:01:00.0 totalMemory: 7.76GiB freeMemory: 6.76GiB 2019-04-06 14:22:40.053732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-04-06 14:22:40.054277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-06 14:22:40.054285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-04-06 14:22:40.054288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-04-06 14:22:40.054476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6579 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) WARNING:tensorflow:From /usr/lib64/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /usr/lib64/python3.6/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version. Instructions for updating: Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`. Train on 45000 samples, validate on 5000 samples WARNING:tensorflow:From /usr/lib64/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. 2019-04-06 14:22:40.858529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-04-06 14:22:40.858574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-06 14:22:40.858580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2019-04-06 14:22:40.858584: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2019-04-06 14:22:40.858792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6579 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:01:00.0, compute capability: 7.5) Epoch 1/25 2019-04-06 14:22:41.397546: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.1 locally - 6s - loss: 1.7876 - acc: 0.3388 - val_loss: 1.4605 - val_acc: 0.4720 Epoch 2/25 - 5s - loss: 1.3256 - acc: 0.5225 - val_loss: 1.1906 - val_acc: 0.5732 Epoch 3/25 - 5s - loss: 1.1481 - acc: 0.5893 - val_loss: 0.9876 - val_acc: 0.6540 Epoch 4/25 - 5s - loss: 1.0334 - acc: 0.6350 - val_loss: 0.9249 - val_acc: 0.6732 Epoch 5/25 - 5s - loss: 0.9532 - acc: 0.6674 - val_loss: 0.8389 - val_acc: 0.7138 .......
source: https://github.com/tensorflow/tensorflow/issues/26155#issuecomment-476705051
So you wanted to talk about a patch?
Yes. This patch for dev-libs/protobuf works for me
I can confirm that tensorflow compiles without problems this way (Cuda 10.1), however, it did not compile with my old version of dev-libs/flatbuffers-1.8.0. Works fine with flatbuffers-1.10.0, so the RDEPEND Section of the tensorflow ebuild "(python? ( ..." should be updated. https://github.com/tensorflow/tensorflow/commit/b62cadc1513a73c1673094c9e35421c8a6c17645
Use tensorflow-1.14.0 for CUDA 10.1 instead
The problem is still there with tensorflow-1.14 and CUDA 10.1.105. I had to update to dev-util/nvidia-cuda-toolkit-10.1.168 (can be easily create from dev-util/nvidia-cuda-toolkit-10.1.105-r1).