Created attachment 722338 [details] Tensorflow 2.5 build log Hej, I tried to compile tensorflow 2.4 and 2.5 both fail. A couple of month ago tensorflow 2.4 compiled just fine but I had uninstall Tensorflow because it was blocking the GCC update from 10.2 to 10.3 and the python 3.8/3.9 update. Now I need Tensorflow again so I masked GCC >= 10.3 and added the python_targets_python3_8 to (hopefully) all packages Tensorflow depends. (In the meantime the system was upgraded from a 4 core IvyBridge CPU with 24 GB RAM to a 6 Core SkyLake CPU with 64 GB RAM) I a desperate measure I also tried to compile with GCC 9.3.0-r2 after that I did a "emerge -e @world" with GCC 10.2 - neither solved the issue. I removed "-march=native" from CFLAGS but it didn't help. I even added 64 GiB as swapfile - just in case but it didn't help. I even tried min and max supported versions of bazel and other dependencies (certainly not an exhaustive permutation I did there...) The errors I see in the attached buildlog: [...] [1A[K[32m[12,095 / 19,390][0m 6 actions running Compiling tensorflow/core/kernels/list_kernels.cu.cc; 24s local Compiling tensorflow/core/kernels/dynamic_stitch_op_gpu.cu.cc; 10s local Compiling .../kernels/strided_slice_op_gpu_number_types.cu.cc; 7s local Compiling tensorflow/core/kernels/example_parsing_ops.cc; 3s local Compiling tensorflow/core/kernels/linalg/matrix_set_diag_op.cc; 2s local Compiling tensorflow/core/kernels/list_kernels.cc; 1s local [1A[K [1A[K [1A[K [1A[K [1A[K [1A[K [1A[K[31m[1mERROR: [0m/var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8/tensorflow/core/kernels/BUILD:4393:18: C++ compilation of rule '//tensorflow/core/kernels:example_parsing_ops' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8-bazel-base/execroot/org_tensorflow && \ exec env - \ CUDA_TOOLKIT_PATH=/opt/cuda \ GCC_HOST_COMPILER_PATH=/usr/x86_64-pc-linux-gnu/gcc-bin/10.2.0/x86_64-pc-linux-gnu-gcc \ HOME=/var/tmp/portage/sci-libs/tensorflow-2.5.0/homedir \ KERAS_HOME=/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/.keras \ PATH=/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/python3.8/bin:/usr/lib/portage/python3.9/ebuild-helpers/xattr:/usr/lib/portage/python3.9/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/12/bin:/usr/lib/llvm/11/bin:/opt/cuda/bin \ PWD=/proc/self/cwd \ PYTHON_BIN_PATH=/usr/bin/python3.8 \ PYTHON_LIB_PATH=/usr/lib/python3.8/site-packages \ TF2_BEHAVIOR=1 \ TF_CUDA_COMPUTE_CAPABILITIES=6.1 \ TF_CUDA_PATHS=/opt/cuda \ TF_CUDA_VERSION=11.1 \ TF_CUDNN_VERSION=8.0 \ TF_SYSTEM_LIBS=absl_py,astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_protobuf,curl,cython,dill_archive,double_conversion,enum34_archive,flatbuffers,functools32_archive,gast_archive,gif,hwloc,icu,jsoncpp_git,libjpeg_turbo,lmdb,nasm,nsync,opt_einsum_archive,org_sqlite,pasta,pcre,png,pybind11,six_archive,snappy,tblib_archive,termcolor_archive,typing_extensions_archive,wrapt,zlib \ external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.d '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.o' -DTF_USE_SNAPPY -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -iquote . -iquote bazel-out/k8-opt/bin -iquote external/com_google_absl -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/gif -iquote bazel-out/k8-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/k8-opt/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/k8-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/k8-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/k8-opt/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -iquote external/local_config_rocm -iquote bazel-out/k8-opt/bin/external/local_config_rocm -iquote external/local_config_tensorrt -iquote bazel-out/k8-opt/bin/external/local_config_tensorrt -iquote external/double_conversion -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote external/snappy -iquote bazel-out/k8-opt/bin/external/snappy -iquote external/curl -iquote bazel-out/k8-opt/bin/external/curl -iquote external/boringssl -iquote bazel-out/k8-opt/bin/external/boringssl -iquote external/jsoncpp_git -iquote bazel-out/k8-opt/bin/external/jsoncpp_git -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/k8-opt/bin/external/local_config_tensorrt/_virtual_includes/tensorrt_headers -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cudnn_header -isystem third_party/eigen3/mkl_include -isystem bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -isystem external/local_config_rocm/rocm -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm -isystem external/local_config_rocm/rocm/rocm/include -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include -isystem external/local_config_rocm/rocm/rocm/include/rocrand -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/rocrand -isystem external/local_config_rocm/rocm/rocm/include/roctracer -isystem bazel-out/k8-opt/bin/external/local_config_rocm/rocm/rocm/include/roctracer -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIC -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -I/usr/include/jsoncpp '-std=c++14' -O2 -pipe -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mfma -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions '-DGOOGLE_CUDA=1' '-DTENSORFLOW_USE_NVCC=1' '-DTENSORFLOW_USE_XLA=1' -DINTEL_MKL -msse3 -pthread -DNV_CUDNN_DISABLE_EXCEPTION '-DGOOGLE_CUDA=1' -DNV_CUDNN_DISABLE_EXCEPTION '-DTENSORFLOW_USE_XLA=1' '-DINTEL_MKL=1' -c tensorflow/core/kernels/example_parsing_ops.cc -o bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.o) Execution platform: @local_execution_config_platform//:platform [32m[12,096 / 19,390][0m 5 actions running Compiling tensorflow/core/kernels/list_kernels.cu.cc; 25s local Compiling tensorflow/core/kernels/dynamic_stitch_op_gpu.cu.cc; 11s local Compiling .../kernels/strided_slice_op_gpu_number_types.cu.cc; 8s local Compiling tensorflow/core/kernels/linalg/matrix_set_diag_op.cc; 3s local Compiling tensorflow/core/kernels/list_kernels.cc; 1s local [1A[K [1A[K [1A[K [1A[K [1A[K [1A[KIn file included from ./tensorflow/core/framework/op_kernel.h:35, from ./tensorflow/core/framework/numeric_op.h:19, from tensorflow/core/kernels/example_parsing_ops.cc:27: tensorflow/core/kernels/example_parsing_ops.cc: In member function ‘virtual void tensorflow::DecodeJSONExampleOp::Compute(tensorflow::OpKernelContext*)’: tensorflow/core/kernels/example_parsing_ops.cc:1221:57: error: ‘class google::protobuf::util::status_internal::Status’ has no member named ‘error_message’; did you mean ‘error_message_’? 1221 | string(status.error_message()))); | ^~~~~~~~~~~~~ ./tensorflow/core/framework/op_requires.h:45:46: note: in definition of macro ‘OP_REQUIRES’ 45 | (CTX)->CtxFailure(__FILE__, __LINE__, (STATUS)); \ | ^~~~~~ tensorflow/core/kernels/example_parsing_ops.cc:1221:57: error: ‘std::string google::protobuf::util::status_internal::Status::error_message_’ is private within this context 1221 | string(status.error_message()))); | ^~~~~~~~~~~~~ ./tensorflow/core/framework/op_requires.h:45:46: note: in definition of macro ‘OP_REQUIRES’ 45 | (CTX)->CtxFailure(__FILE__, __LINE__, (STATUS)); \ | ^~~~~~ In file included from /usr/include/google/protobuf/stubs/logging.h:36, from /usr/include/google/protobuf/io/coded_stream.h:150, from bazel-out/k8-opt/bin/tensorflow/core/protobuf/error_codes.pb.h:23, from ./tensorflow/core/platform/status.h:30, from ./tensorflow/core/lib/core/status.h:19, from ./tensorflow/core/lib/monitoring/counter.h:37, from ./tensorflow/core/framework/metrics.h:19, from ./tensorflow/core/common_runtime/metrics.h:22, from tensorflow/core/kernels/example_parsing_ops.cc:23: /usr/include/google/protobuf/stubs/status.h:97:15: note: declared private here 97 | std::string error_message_; | ^~~~~~~~~~~~~~ [32m[12,096 / 19,390][0m 5 actions running Compiling tensorflow/core/kernels/list_kernels.cu.cc; 25s local Compiling tensorflow/core/kernels/dynamic_stitch_op_gpu.cu.cc; 11s local Compiling .../kernels/strided_slice_op_gpu_number_types.cu.cc; 8s local Compiling tensorflow/core/kernels/linalg/matrix_set_diag_op.cc; 3s local Compiling tensorflow/core/kernels/list_kernels.cc; 1s local [1A[K [1A[K [1A[K [1A[K [1A[K [1A[K[32mINFO: [0mElapsed time: 2826.482s, Critical Path: 141.00s [32m[12,101 / 19,390][0m checking cached actions [1A[K[32mINFO: [0m12101 processes: 5934 internal, 6167 local. [32m[12,101 / 19,390][0m checking cached actions [1A[K[31m[1mFAILED:[0m Build did NOT complete successfully [1A[K[31m[1mFAILED:[0m Build did NOT complete successfully [0m * ERROR: sci-libs/tensorflow-2.5.0::gentoo failed (compile phase): * ebazel failed * * Call stack: * ebuild.sh, line 127: Called src_compile * environment, line 4158: Called ebazel 'build' '//tensorflow:libtensorflow_framework.so' '//tensorflow:libtensorflow.so' * environment, line 2510: Called die * The specific snippet of code: * "${@}" || die "ebazel failed" * * If you need support, post the output of `emerge --info '=sci-libs/tensorflow-2.5.0::gentoo'`, * the complete build log and the output of `emerge -pqv '=sci-libs/tensorflow-2.5.0::gentoo'`. * The complete build log is located at '/var/log/portage/sci-libs:tensorflow-2.5.0:20210706-090837.log'. * For convenience, a symlink to the build log is located at '/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/build.log'. * The ebuild environment file is located at '/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/environment'. * Working directory: '/var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8' * S: '/var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0' Any ideas how to make Tensoreflow compile again are welcome :) Cheers, Bjoern
Created attachment 722341 [details] emerge --info
Created attachment 722344 [details] emerge -ept sci-libs/tensorflow
This is already since ProtoBuf 3.16.0: https://github.com/protocolbuffers/protobuf/pull/8354 https://github.com/protocolbuffers/protobuf/commit/9ad97629be72eeecf8bc9fe8145e55ceaeab6b78#diff-26f14c21bd27b6500347fdacdeea49b8bccde636aab2ecae545515e76a5a48bdL96-L98 As seen below this deleted function, the solution is to use message() instead of error_message(). (Both of them were defined identically.)
(In reply to Arfrever Frehtes Taifersar Arahesis from comment #3) > This is already since ProtoBuf 3.16.0: > > https://github.com/protocolbuffers/protobuf/pull/8354 > https://github.com/protocolbuffers/protobuf/commit/ > 9ad97629be72eeecf8bc9fe8145e55ceaeab6b78#diff- > 26f14c21bd27b6500347fdacdeea49b8bccde636aab2ecae545515e76a5a48bdL96-L98 > > As seen below this deleted function, the solution is to use message() > instead of error_message(). (Both of them were defined identically.) Thanks for that hint! I masked >=dev-libs/protobuf-3.16.0 & >=dev-python/protobuf-python-3.16.0 and I and the compiler seems to be beyond the point I reported above + it is now running for ~2 hours compared to ~30 minutes before masking protobuf. The most straight forward way to fix this would be to require dev-libs/protobuf-3.15.8 & dev-python/protobuf-python-3.15.8 in the Tensorflow-2.5 ebuild - Am I really the first one to trip the version incompatibility of protobuf >=3.16.0 with Tensorflow on Gentoo? If not addressed upstream, will a patch to address the root cause as mentioned by Arfrever Frehtes Taifersar Arahesis be be feasible? I'll post an update once Tensorflow has been compiled entirely to confirm the above. Cheers, Bjoern
After 5 hours, 32 minutes and 19 seconds Tensorflow 2.5 compiled successfully and is working as expected. The solution to mask >=dev-libs/protobuf-3.16.0 & >=dev-python/protobuf-python-3.16.0 worked for me. Cheers, Bjoern
Okay, maybe downgrading protobuf is not the way to go or it is simply a bug in tensorflow (https://github.com/tensorflow/tensorflow/issues/50545) model = keras.Sequential( File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 114, in __init__ super(functional.Functional, self).__init__( # pylint: disable=bad-super-call File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 318, in __init__ self._init_batch_counters() File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 326, in _init_batch_counters self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 262, in __call__ return cls._variable_v2_call(*args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 244, in _variable_v2_call return previous_getter( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 237, in <lambda> previous_getter = lambda **kws: default_variable_creator_v2(None, **kws) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 2662, in default_variable_creator_v2 return resource_variable_ops.ResourceVariable( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1584, in __init__ self._init_from_args( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1738, in _init_from_args handle = eager_safe_variable_handle( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 237, in eager_safe_variable_handle return _variable_handle_from_shape_and_dtype(shape, dtype, shared_name, name, File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 177, in _variable_handle_from_shape_and_dtype cpp_shape_inference_pb2.CppShapeInferenceResult.HandleShapeAndType( TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorflow.TensorShapeProto got tensorflow.TensorShapeProto.
I had the same experience (failed with later protobuf, masked it, thought it worked with earlier -- ran into the runtime error above when running a test-suite). I was able to get past the error_message-member issue with the patch suggested here https://bugs.gentoo.org/800824#c3 ``` --- a/tensorflow/core/kernels/example_parsing_ops.cc 2021-07-07 11:12:34.110293208 +0200 +++ b/tensorflow/core/kernels/example_parsing_ops.cc 2021-07-07 11:13:04.013291922 +0200 @@ -1218,7 +1218,7 @@ resolver_.get(), "type.googleapis.com/tensorflow.Example", &in, &out); OP_REQUIRES(ctx, status.ok(), errors::InvalidArgument("Error while parsing JSON: ", - string(status.error_message()))); + string(status.message()))); } } ``` but I am running into some later build issues (portions of the error messages/log is excerpted below) ``` ERROR: /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8/tensorflow/core/kernels/BUILD:5337:18: C++ compilation of rule '//tensorflow/core/kernels:multinomial_op_gpu' failed (Exit 2): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8-bazel-base/execroot/org_tensorflow && \ ... external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h(318): error: unrecognized token external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorMap.h(318): error: expected a "," 2 errors detected in the compilation of "tensorflow/core/kernels/multinomial_op_gpu.cu.cc". ```
Created attachment 722599 [details] sci-libs:tensorflow-2.5.0:20210707-090010.log.bz2 Same partial success here... after hours of compile time... ebuild /usr/portage/sci-libs/tensorflow/tensorflow-2.5.0.ebuild unpack sed -e 's|status.error_message|status.message|g' -i /dev/shm/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0/tensorflow/core/kernels/example_parsing_ops.cc ebuild /usr/portage/sci-libs/tensorflow/tensorflow-2.5.0.ebuild compile [...] [23,726 / 25,196] Compiling tensorflow/compiler/tf2xla/kernels/stateless_random_ops.cc [for host]; 15s local ... (12 actions, 11 running) [26,009 / 27,736] Compiling tensorflow/core/kernels/unique_op_gpu.cu.cc; 86s local ... (12 actions, 11 running) [27,937 / 29,220] Compiling tensorflow/compiler/xla/service/spmd/spmd_partitioner.cc; 22s local ... (12 actions, 11 running) ERROR: /dev/shm/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8-bazel-base/external/nccl_archive/BUILD.bazel:54:17: C++ compilation of rule '@nccl_archive//:device_lib' failed (Exit 6): crosstool_wrapper_driver_is_not_gcc failed: error executing command (cd /dev/shm/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8-bazel-base/execroot/org_tensorflow && \ exec env - \ CUDA_TOOLKIT_PATH=/opt/cuda \ GCC_HOST_COMPILER_PATH=/usr/x86_64-pc-linux-gnu/gcc-bin/10.2.0/x86_64-pc-linux-gnu-gcc \ HOME=/dev/shm/portage/sci-libs/tensorflow-2.5.0/homedir \ KERAS_HOME=/dev/shm/portage/sci-libs/tensorflow-2.5.0/temp/.keras \ PATH=/dev/shm/portage/sci-libs/tensorflow-2.5.0/temp/python3.8/bin:/dev/shm/portage/sci-libs/tensorflow-2.5.0/temp/python3.8/bin:/usr/lib/portage/python3.9/ebuild-helpers/xattr:/usr/lib/portage/python3.9/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/12/bin:/usr/lib/llvm/11/bin:/opt/cuda/bin \ PWD=/proc/self/cwd \ PYTHON_BIN_PATH=/usr/bin/python3.8 \ PYTHON_LIB_PATH=/usr/lib/python3.8/site-packages \ TF2_BEHAVIOR=1 \ TF_CUDA_COMPUTE_CAPABILITIES=6.1 \ TF_CUDA_PATHS=/opt/cuda \ TF_CUDA_VERSION=11.1 \ TF_CUDNN_VERSION=8.0 \ TF_SYSTEM_LIBS=absl_py,astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_protobuf,curl,cython,dill_archive,double_conversion,enum34_archive,flatbuffers,functools32_archive,gast_archive,gif,hwloc,icu,jsoncpp_git,libjpeg_turbo,lmdb,nasm,nsync,opt_einsum_archive,org_sqlite,pasta,pcre,png,pybind11,six_archive,snappy,tblib_archive,termcolor_archive,typing_extensions_archive,wrapt,zlib \ external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -MD -MF bazel-out/k8-opt/bin/external/nccl_archive/_objs/device_lib/max_f32_reduce.cu.d '-frandom-seed=bazel-out/k8-opt/bin/external/nccl_archive/_objs/device_lib/max_f32_reduce.cu.o' -iquote external/nccl_archive -iquote bazel-out/k8-opt/bin/external/nccl_archive -iquote external/local_config_cuda -iquote bazel-out/k8-opt/bin/external/local_config_cuda -Ibazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/device_hdrs -Ibazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/include_hdrs -Ibazel-out/k8-opt/bin/external/local_config_cuda/cuda/_virtual_includes/cuda_headers_virtual -Ibazel-out/k8-opt/bin/external/nccl_archive/_virtual_includes/src_hdrs -isystem external/local_config_cuda/cuda -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/k8-opt/bin/external/local_config_cuda/cuda/cuda/include -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -fPIE -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -Wall -fno-omit-frame-pointer -no-canonical-prefixes -fno-canonical-system-headers -DNDEBUG -g0 -O2 -ffunction-sections -fdata-sections -w -DAUTOLOAD_DYNAMIC_KERNELS -I/usr/include/jsoncpp '-std=c++14' '-march=native' -O2 -pipe -msse -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mfma -x cuda '-DGOOGLE_CUDA=1' '-Xcuda-fatbinary=--compress-all' '--no-cuda-include-ptx=all' '--cuda-include-ptx=sm_61' '--cuda-gpu-arch=sm_61' -nvcc_options 'relocatable-device-code=true' -nvcc_options 'ptxas-options=-maxrregcount=96' -c bazel-out/k8-opt/bin/external/nccl_archive/src/collectives/device/max_f32_reduce.cu.cc -o bazel-out/k8-opt/bin/external/nccl_archive/_objs/device_lib/max_f32_reduce.cu.o) Execution platform: @local_execution_config_platform//:platform double free or corruption (out) nvcc error : 'cicc' died due to signal 6 Target //tensorflow/tools/pip_package:build_pip_package failed to build INFO: Elapsed time: 9494.469s, Critical Path: 261.97s INFO: 24170 processes: 3290 internal, 20880 local. FAILED: Build did NOT complete successfully FAILED: Build did NOT complete successfully * ERROR: sci-libs/tensorflow-2.5.0::gentoo failed (compile phase): * ebazel failed * * Call stack: * ebuild.sh, line 127: Called src_compile * environment, line 4168: Called python_foreach_impl 'run_in_build_dir' 'do_compile' * environment, line 3760: Called multibuild_foreach_variant '_python_multibuild_wrapper' 'run_in_build_dir' 'do_compile' * environment, line 3236: Called _multibuild_run '_python_multibuild_wrapper' 'run_in_build_dir' 'do_compile' * environment, line 3234: Called _python_multibuild_wrapper 'run_in_build_dir' 'do_compile' * environment, line 1089: Called run_in_build_dir 'do_compile' * environment, line 4140: Called do_compile * environment, line 4164: Called ebazel 'build' '//tensorflow/tools/pip_package:build_pip_package' * environment, line 2512: Called die * The specific snippet of code: * "${@}" || die "ebazel failed" * * If you need support, post the output of `emerge --info '=sci-libs/tensorflow-2.5.0::gentoo'`, * the complete build log and the output of `emerge -pqv '=sci-libs/tensorflow-2.5.0::gentoo'`. * The complete build log is located at '/var/log/portage/sci-libs:tensorflow-2.5.0:20210707-090010.log'. * For convenience, a symlink to the build log is located at '/dev/shm/portage/sci-libs/tensorflow-2.5.0/temp/build.log'. * The ebuild environment file is located at '/dev/shm/portage/sci-libs/tensorflow-2.5.0/temp/environment'. * Working directory: '/dev/shm/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8' * S: '/dev/shm/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0' Any ideas hos to get around that?
Oookay, apparently 64 GB of RAM is not enough when using -j12 and /dev/shm to compile tensor flow. I switched back to the default portage dirs #PORTAGE_TMPFS="/dev/shm" #PORTAGE_TMPDIR="/dev/shm" #BUILD_PREFIX="/dev/shm" and used 6 instead of 12 jobs MAKEOPTS="-j6" After that, the following procedure worked for me to compile sci-libs/tensorflow-2.5.0 ebuild /usr/portage/sci-libs/tensorflow/tensorflow-2.5.0.ebuild unpack sed -e 's|status.error_message|status.message|g' -i /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0/tensorflow/core/kernels/example_parsing_ops.cc MAKEOPTS="-j6" ; ebuild /usr/portage/sci-libs/tensorflow/tensorflow-2.5.0.ebuild compile * Package: sci-libs/tensorflow-2.5.0 * Repository: gentoo * Maintainer: perfinion@gentoo.org * USE: abi_x86_64 amd64 cpu_flags_x86_avx cpu_flags_x86_avx2 cpu_flags_x86_fma3 cpu_flags_x86_sse cpu_flags_x86_sse2 cpu_flags_x86_sse3 cpu_flags_x86_sse4_1 cpu_flags_x86_sse4_2 cuda elibc_glibc kernel_linux python python_targets_python3_8 userland_GNU xla * FEATURES: network-sandbox preserve-libs sandbox userpriv usersandbox * Checking for at least 5 GiB RAM ... [ ok ] * Checking for at least 10 GiB disk space at "/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp" ... [ ok ] * Package: sci-libs/tensorflow-2.5.0 * Repository: gentoo * Maintainer: perfinion@gentoo.org * USE: abi_x86_64 amd64 cpu_flags_x86_avx cpu_flags_x86_avx2 cpu_flags_x86_fma3 cpu_flags_x86_sse cpu_flags_x86_sse2 cpu_flags_x86_sse3 cpu_flags_x86_sse4_1 cpu_flags_x86_sse4_2 cuda elibc_glibc kernel_linux python python_targets_python3_8 userland_GNU xla * FEATURES: network-sandbox preserve-libs sandbox userpriv usersandbox * TensorFlow 2.0 is a major release that contains some incompatibilities * with TensorFlow 1.x. For more information about migrating to TF2.0 see: * https://www.tensorflow.org/guide/migrate * python3_8: running count_impls * Checking for at least 5 GiB RAM ... [ ok ] * Checking for at least 16 GiB disk space at "/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp" ... [ ok ] >>> Unpacking source... >>> Unpacking tensorflow-2.5.0.tar.gz to /var/tmp/portage/sci-libs/tensorflow-2.5.0/work >>> Unpacking tensorflow-patches-2.5.0.tar.bz2 to /var/tmp/portage/sci-libs/tensorflow-2.5.0/work [...] [27,858 / 29,047] Compiling tensorflow/compiler/xla/service/gpu/ir_emitter_unnested.cc; 17s local ... (6 actions, 5 running) [31,211 / 31,624] Compiling tensorflow/compiler/tf2xla/kernels/lower_upper_bound_ops.cc; 9s local ... (6 actions, 5 running) Target //tensorflow/tools/pip_package:build_pip_package up-to-date: bazel-bin/tensorflow/tools/pip_package/build_pip_package INFO: Elapsed time: 12778.919s, Critical Path: 190.88s INFO: 26436 processes: 3364 internal, 23072 local. INFO: Build completed successfully, 26436 total actions INFO: Build completed successfully, 26436 total actions bazel --bazelrc=/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/bazelrc --output_base=/var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-bazel-base shutdown WARNING: Running command "shutdown" in batch mode. Batch mode is triggered when not running Bazel within a workspace. If you intend to shutdown an existing Bazel server, run "bazel shutdown" from the directory where it was started. WARNING: ignoring LD_PRELOAD in environment. >>> Source compiled. MAKEOPTS="-j6" ; ebuild /usr/portage/sci-libs/tensorflow/tensorflow-2.5.0.ebuild merge [...] >>> Completed installing sci-libs/tensorflow-2.5.0 into /var/tmp/portage/sci-libs/tensorflow-2.5.0/image * Final size of build directory: 17706292 KiB (16.8 GiB) * Final size of installed tree: 1888412 KiB ( 1.8 GiB) * QA Notice: DISTUTILS_USE_SETUPTOOLS is not used when DISTUTILS_OPTIONAL * is enabled. * Verifying compiled files in /usr/lib/python3.8/site-packages * * QA Notice: This package seems to contain tests but they are not enabled. * Please either run tests (via distutils_enable_tests or declaring * python_test yourself), or add RESTRICT="test" along with an explanatory * comment if tests cannot be run. * [...] >>> /usr/lib64/libtensorflow.so -> libtensorflow.so.2 >>> sci-libs/tensorflow-2.5.0 merged. >>> Regenerating /etc/ld.so.cache... emerge -1av sci-visualization/tensorboard but still no luck using tensor flow: model = keras.Sequential( [ keras.Input(shape=(76, 36, 1)), layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Flatten(), layers.Dropout(0.5), layers.Dense(10, activation="softmax"), ] ) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) Traceback (most recent call last): File "./6-train-model.py", line 198, in <module> model = keras.Sequential( File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 114, in __init__ super(functional.Functional, self).__init__( # pylint: disable=bad-super-call File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 318, in __init__ self._init_batch_counters() File "/usr/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 522, in _method_wrapper result = method(self, *args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 326, in _init_batch_counters self._train_counter = variables.Variable(0, dtype='int64', aggregation=agg) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 262, in __call__ return cls._variable_v2_call(*args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 244, in _variable_v2_call return previous_getter( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 237, in <lambda> previous_getter = lambda **kws: default_variable_creator_v2(None, **kws) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 2662, in default_variable_creator_v2 return resource_variable_ops.ResourceVariable( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1584, in __init__ self._init_from_args( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1738, in _init_from_args handle = eager_safe_variable_handle( File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 237, in eager_safe_variable_handle return _variable_handle_from_shape_and_dtype(shape, dtype, shared_name, name, File "/usr/lib/python3.8/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 177, in _variable_handle_from_shape_and_dtype cpp_shape_inference_pb2.CppShapeInferenceResult.HandleShapeAndType( TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorflow.TensorShapeProto got tensorflow.TensorShapeProto.
Yeah I too currently receive the "tensorflow.TensorShapeProto got tensorflow.TensorShapeProto"-error. At first I was not able to build against current protobuf-3.17.3 so I downgraded to 3.15.8 and built tensorflow-2.5.0 successfully. With the older version of protobuf however I got the above error. I then upgraded protobuf to 3.17.3 but without rebuilding tensorflow. I no longer got the above error, and the test-suite I was running (the object-detection one) reported no errors: python3 object_detection/builders/model_builder_tf2_test.py Nonetheless portage was adamant about tensorflow needing to be rebuilt due to the protobuf update. I ran into the crosstool_wrapper_driver_is_not_gcc-error while trying to build tensorflow against the now updated protobuf-3.17.3, however (as well as some sporadic other build errors that I never saw on the next rebuild). I believe I rebuilt cudnn and grpc(io?) and nvidia-cuda-toolkit, and after that rebuilt tensorflow-2.5.0 successfully somehow. Unfortunately, after this rebuild I am where I am now and I receive the mentioned error when running the tests. ("tensorflow.TensorShapeProto got tensorflow.TensorShapeProto") I have tried upgrading to protobuf-9999 without rebuilding tensorflow, same error. I tried rebuilding tensorflow against protobuf-9999, still same error. for what it is worth I have a 56GB tmpfs of /var/tmp/portage tmpfs /var/tmp/portage tmpfs size=56G and I can build tensorflow in memory with -j9 (I only have 8 threads so any more is pointless, and even 9 is probably a stretch).
Is this patch perhaps worth trying out? It looks like it is set to be included in tensorflow-2.6 https://github.com/tensorflow/tensorflow/issues/50545#issuecomment-872307752 https://github.com/tensorflow/tensorflow/commit/95abf88e4c117f8445308c3174cc42795a6694e6 I can not start another build to try it right now (probably when I go to bed).
It seems swapping around the two import lines in /usr/lib/python3.8/site-packages/tensorflow/python/__init__.py that were mentioned in the github-comment so that they now say from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow from tensorflow.python.eager import context seems to make the module/protobuf-error go away.
(In reply to ykui from comment #11) > Is this patch perhaps worth trying out? It looks like it is set to be > included in tensorflow-2.6 > > https://github.com/tensorflow/tensorflow/issues/50545#issuecomment-872307752 > > https://github.com/tensorflow/tensorflow/commit/ > 95abf88e4c117f8445308c3174cc42795a6694e6 > > I can not start another build to try it right now (probably when I go to > bed). Interesting that everything seems to work when you compile against old protobuf and then simply update protobuf to the latest version without recompiling Tensorflow... I did rebuild my entire @world tree at some point... so I don't think it is a problem of recompiling other packages against the latest protobuf version. I applied the above mentioned patch alongside with the example_parsing_ops .cc fix (sed line from previous comments). TensorFlow 2.5 is compiling now. In somewhat less than 6h we know if this patch fixed the "expected tensorflow.TensorShapeProto got tensorflow.TensorShapeProto." issue.
(In reply to ykui from comment #12) > It seems swapping around the two import lines in > /usr/lib/python3.8/site-packages/tensorflow/python/__init__.py > that were mentioned in the github-comment so that they now say > > from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow > from tensorflow.python.eager import context > > seems to make the module/protobuf-error go away. I can confirm that this works! [...] -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) WARNING:tensorflow:Please add `keras.layers.InputLayer` instead of `keras.Input` to Sequential model. `keras.Input` is intended to be used by Functional model. Model: "sequential" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d (Conv2D) (None, 74, 34, 32) 320 _________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 37, 17, 32) 0 _________________________________________________________________ conv2d_1 (Conv2D) (None, 35, 15, 64) 18496 _________________________________________________________________ max_pooling2d_1 (MaxPooling2 (None, 17, 7, 64) 0 _________________________________________________________________ flatten (Flatten) (None, 7616) 0 _________________________________________________________________ dropout (Dropout) (None, 7616) 0 _________________________________________________________________ dense (Dense) (None, 10) 76170 ================================================================= Total params: 94,986 Trainable params: 94,986 Non-trainable params: 0 _________________________________________________________________ 2021-07-08 13:26:47.895867: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) 2021-07-08 13:26:47.896111: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3699850000 Hz Epoch 1/50 2021-07-08 13:26:48.188259: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8 2021-07-08 13:26:48.740788: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8005 2021-07-08 13:26:50.748375: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11 2021-07-08 13:26:50.987907: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11 654/654 [==============================] - 14s 15ms/step - loss: 5.4520 - accuracy: 0.9454 - val_loss: 12.9417 - val_accuracy: 0.1317 Epoch 2/50 654/654 [==============================] - 10s 15ms/step - loss: 0.1530 - accuracy: 0.9742 - val_loss: 23.6971 - val_accuracy: 0.1317 Epoch 3/50 654/654 [==============================] - 10s 15ms/step - loss: 0.4149 - accuracy: 0.9710 - val_loss: 52.9526 - val_accuracy: 0.1317 Epoch 4/50 654/654 [==============================] - 10s 15ms/step - loss: 0.4733 - accuracy: 0.9800 - val_loss: 102.9791 - val_accuracy: 0.1317 Epoch 5/50 654/654 [==============================] - 10s 15ms/step - loss: 0.1687 - accuracy: 0.9892 - val_loss: 82.1729 - val_accuracy: 0.1317 Epoch 6/50 654/654 [==============================] - 10s 15ms/step - loss: 0.3617 - accuracy: 0.9923 - val_loss: 154.5778 - val_accuracy: 0.1317 Epoch 7/50 654/654 [==============================] - 10s 15ms/step - loss: 0.0751 - accuracy: 0.9945 - val_loss: 84.9845 - val_accuracy: 0.1317 Epoch 8/50 617/654 [===========================>..] - ETA: 0s - loss: 0.0236 - accuracy: 0.9973
I guess it is a different bug strictly speaking, but is tensorflow actually compatible with numpy-1.2{0,1}.x ? I get this error NotImplementedError: Cannot convert a symbolic Tensor (cond_2/strided_slice:0) to a numpy array. This error may indicate that you're trying to pass a Tensor to a NumPy call, which is not supported when running a model -- which from searching around has been suggested to be a numpy incompatibility. https://stackoverflow.com/questions/66207609/notimplementederror-cannot-convert-a-symbolic-tensor-lstm-2-strided-slice0-t/66207610 Portage does not have numpy-1.19 or earlier in the tree, and if I install an earlier numpy-version like numpy-1.18 with pip, I get this error: RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd I do not believe I can rebuild tensorflow against pip-numpy.
(In reply to Bjoern Olausson from comment #13) > (In reply to ykui from comment #11) > > Is this patch perhaps worth trying out? It looks like it is set to be > > included in tensorflow-2.6 > > > > https://github.com/tensorflow/tensorflow/issues/50545#issuecomment-872307752 > > > > https://github.com/tensorflow/tensorflow/commit/ > > 95abf88e4c117f8445308c3174cc42795a6694e6 > > > > I can not start another build to try it right now (probably when I go to > > bed). > > Interesting that everything seems to work when you compile against old > protobuf and then simply update protobuf to the latest version without > recompiling Tensorflow... > > I did rebuild my entire @world tree at some point... so I don't think it is > a problem of recompiling other packages against the latest protobuf version. > > I applied the above mentioned patch alongside with the example_parsing_ops > .cc fix (sed line from previous comments). > > TensorFlow 2.5 is compiling now. In somewhat less than 6h we know if this > patch fixed the "expected tensorflow.TensorShapeProto got > tensorflow.TensorShapeProto." issue. The "patch" does not fix the "TypeError: Parameter to MergeFrom() must be instance of same class: expected tensorflow.TensorShapeProto got tensorflow.TensorShapeProto.". I still have to swap the lines in /usr/lib/python3.8/site-packages/tensorflow/python/__init__.py
ERROR: /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8/tensorflow/core/kernels/BUILD:4393:18: C++ compilation of rule '//tensorflow/core/kernels:example_parsing_ops' failed (Exit 1): gcc failed: error executing command (cd /var/tmp/portage/sci-libs/tensorflow-2.5.0/work/tensorflow-2.5.0-python3_8-bazel-base/execroot/org_tensorflow && \ exec env - \ HOME=/var/tmp/portage/sci-libs/tensorflow-2.5.0/homedir \ KERAS_HOME=/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/.keras \ PATH=/var/tmp/portage/sci-libs/tensorflow-2.5.0/temp/python3.8/bin:/var/tmp/portage/._portage_reinstall_.gcevqsmz/bin/ebuild-helpers/xattr:/usr/lib/portage/python3.9/ebuild-helpers/xattr:/var/tmp/portage/._portage_reinstall_.gcevqsmz/bin/ebuild-helpers:/usr/lib/portage/python3.9/ebuild-helpers:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/bin:/usr/lib/llvm/12/bin:/usr/lib/llvm/11/bin \ PWD=/proc/self/cwd \ PYTHON_BIN_PATH=/usr/bin/python3.8 \ PYTHON_LIB_PATH=/usr/lib/python3.8/site-packages \ TF2_BEHAVIOR=1 \ TF_SYSTEM_LIBS=absl_py,astor_archive,astunparse_archive,boringssl,com_github_googlecloudplatform_google_cloud_cpp,com_github_grpc_grpc,com_google_protobuf,curl,cython,dill_archive,double_conversion,enum34_archive,flatbuffers,functools32_archive,gast_archive,gif,hwloc,icu,jsoncpp_git,libjpeg_turbo,lmdb,nasm,nsync,opt_einsum_archive,org_sqlite,pasta,pcre,png,pybind11,six_archive,snappy,tblib_archive,termcolor_archive,typing_extensions_archive,wrapt,zlib \ /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections -fdata-sections '-std=c++0x' -MD -MF bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.d '-frandom-seed=bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.o' -fPIC -DTF_USE_SNAPPY -DEIGEN_MPL2_ONLY '-DEIGEN_MAX_ALIGN_BYTES=64' -iquote . -iquote bazel-out/k8-opt/bin -iquote external/com_google_absl -iquote bazel-out/k8-opt/bin/external/com_google_absl -iquote external/nsync -iquote bazel-out/k8-opt/bin/external/nsync -iquote external/eigen_archive -iquote bazel-out/k8-opt/bin/external/eigen_archive -iquote external/gif -iquote bazel-out/k8-opt/bin/external/gif -iquote external/libjpeg_turbo -iquote bazel-out/k8-opt/bin/external/libjpeg_turbo -iquote external/com_google_protobuf -iquote bazel-out/k8-opt/bin/external/com_google_protobuf -iquote external/com_googlesource_code_re2 -iquote bazel-out/k8-opt/bin/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/k8-opt/bin/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/k8-opt/bin/external/fft2d -iquote external/highwayhash -iquote bazel-out/k8-opt/bin/external/highwayhash -iquote external/zlib -iquote bazel-out/k8-opt/bin/external/zlib -iquote external/double_conversion -iquote bazel-out/k8-opt/bin/external/double_conversion -iquote external/snappy -iquote bazel-out/k8-opt/bin/external/snappy -iquote external/curl -iquote bazel-out/k8-opt/bin/external/curl -iquote external/boringssl -iquote bazel-out/k8-opt/bin/external/boringssl -iquote external/jsoncpp_git -iquote bazel-out/k8-opt/bin/external/jsoncpp_git -isystem third_party/eigen3/mkl_include -isystem bazel-out/k8-opt/bin/third_party/eigen3/mkl_include -isystem external/eigen_archive -isystem bazel-out/k8-opt/bin/external/eigen_archive -isystem external/farmhash_archive/src -isystem bazel-out/k8-opt/bin/external/farmhash_archive/src -w -DAUTOLOAD_DYNAMIC_KERNELS -I/usr/include/jsoncpp '-std=c++14' '-mtune=haswell' -O2 -pipe -msse -msse2 -msse3 -msse4.1 -msse4.2 -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare '-ftemplate-depth=900' -fno-exceptions -DINTEL_MKL -msse3 -pthread '-DINTEL_MKL=1' -fno-canonical-system-headers -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -c tensorflow/core/kernels/example_parsing_ops.cc -o bazel-out/k8-opt/bin/tensorflow/core/kernels/_objs/example_parsing_ops/example_parsing_ops.pic.o) Execution platform: @local_execution_config_platform//:platform In file included from ./tensorflow/core/framework/op_kernel.h:35, from ./tensorflow/core/framework/numeric_op.h:19, from tensorflow/core/kernels/example_parsing_ops.cc:27: tensorflow/core/kernels/example_parsing_ops.cc: In member function 'virtual void tensorflow::DecodeJSONExampleOp::Compute(tensorflow::OpKernelContext*)': tensorflow/core/kernels/example_parsing_ops.cc:1221:57: error: 'class google::protobuf::util::status_internal::Status' has no member named 'error_message'; did you mean 'error_message_'? 1221 | string(status.error_message()))); | ^~~~~~~~~~~~~ ./tensorflow/core/framework/op_requires.h:45:46: note: in definition of macro 'OP_REQUIRES' 45 | (CTX)->CtxFailure(__FILE__, __LINE__, (STATUS)); \ | ^~~~~~ tensorflow/core/kernels/example_parsing_ops.cc:1221:57: error: 'std::string google::protobuf::util::status_internal::Status::error_message_' is private within this context 1221 | string(status.error_message()))); | ^~~~~~~~~~~~~ ./tensorflow/core/framework/op_requires.h:45:46: note: in definition of macro 'OP_REQUIRES' 45 | (CTX)->CtxFailure(__FILE__, __LINE__, (STATUS)); \ | ^~~~~~ In file included from /usr/include/google/protobuf/stubs/logging.h:36, from /usr/include/google/protobuf/io/coded_stream.h:150, from bazel-out/k8-opt/bin/tensorflow/core/protobuf/error_codes.pb.h:23, from ./tensorflow/core/platform/status.h:30, from ./tensorflow/core/lib/core/status.h:19, from ./tensorflow/core/lib/monitoring/counter.h:37, from ./tensorflow/core/framework/metrics.h:19, from ./tensorflow/core/common_runtime/metrics.h:22, from tensorflow/core/kernels/example_parsing_ops.cc:23: /usr/include/google/protobuf/stubs/status.h:97:15: note: declared private here 97 | std::string error_message_; | ^~~~~~~~~~~~~~ INFO: Elapsed time: 12507.846s, Critical Path: 275.08s INFO: 4726 processes: 346 internal, 4380 local. FAILED: Build did NOT complete successfully
Created attachment 723883 [details] tensorflow-2.5.0-r1.ebuild For the time to the next release, I created a ebuild (tensorflow-2.5.0-r1.ebuild) + new patch (StatusMessage_TypeError.patch) to address this bug. Cheers, Bjoern
Created attachment 723886 [details, diff] StatusMessage_TypeError.patch Patch required for tensorflow-2.5.0-r1.ebuild
*** Bug 802660 has been marked as a duplicate of this bug. ***
*** Bug 804564 has been marked as a duplicate of this bug. ***
*** Bug 805305 has been marked as a duplicate of this bug. ***
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=82a04476b0dd7ea1628495350b035908e72c1d94 commit 82a04476b0dd7ea1628495350b035908e72c1d94 Author: Jason Zaman <perfinion@gentoo.org> AuthorDate: 2021-08-01 13:13:54 +0000 Commit: Jason Zaman <perfinion@gentoo.org> CommitDate: 2021-08-01 13:19:12 +0000 sci-libs/tensorflow: Add python3_9 and build against proto-3.16 Protobuf 3.16 changed the status API in https://github.com/protocolbuffers/protobuf/commit/59ea5c8f19de47dc15cbce2e2e97d9de01d50fb9 so must be patched. All deps now support python3_9 as well so enable support in TF Closes: https://bugs.gentoo.org/800824 Closes: https://bugs.gentoo.org/802732 Package-Manager: Portage-3.0.20, Repoman-3.0.2 Signed-off-by: Jason Zaman <perfinion@gentoo.org> sci-libs/tensorflow/Manifest | 1 + sci-libs/tensorflow/tensorflow-2.5.0-r1.ebuild | 410 +++++++++++++++++++++++++ 2 files changed, 411 insertions(+)