Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 700030 - sci-libs/tensorflow-2.0.0 does not build after the cuda USE flag is turned on
Summary: sci-libs/tensorflow-2.0.0 does not build after the cuda USE flag is turned on
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Jason Zaman
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-13 18:10 UTC by Hendrik Klug
Modified: 2020-05-08 19:06 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
first 1000 lines fromt the build.log of emerge tensorflow (head_build.log,135.67 KB, application/octet-stream)
2019-11-13 21:23 UTC, Hendrik Klug
Details
last 1000 lines from the build.log of emerge tensorflow (tail_build.log,592.88 KB, text/plain)
2019-11-13 21:24 UTC, Hendrik Klug
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hendrik Klug 2019-11-13 18:10:48 UTC
I changed the cuda USE flag with 
``` 
echo "sci-libs/tensorflow cuda" >> /etc/portage/package.use/manual 
```
While I was able to build the same version, I am not able any more since this change.
Comment 1 Hendrik Klug 2019-11-13 21:23:32 UTC
Created attachment 596028 [details]
first 1000 lines fromt the build.log of emerge tensorflow
Comment 2 Hendrik Klug 2019-11-13 21:24:02 UTC
Created attachment 596030 [details]
last 1000 lines from the build.log of emerge tensorflow
Comment 3 Jason Zaman gentoo-dev 2019-11-14 02:37:16 UTC
/usr/bin/ld: /usr/lib64/libjsoncpp.so.21: undefined reference to `std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >::basic_ostringstream()@GLIBCXX_3.4.26'
collect2: error: ld returned 1 exit status

Thats strange, what compiler are you on? Can you rebuild jsoncpp first and see if that makes a difference? havent really seen these types of _cxx11 errors since the GCC-4.9 upgrade but it does look a little similar. also does nvidias nvcc work with your current compiler? (I'm not sure if it supports gcc9?)

You probably should turn off the python_targets_python2_7 USEflag too. it doubles the build time and python2 is dead soon anyway.
Comment 4 Hendrik Klug 2019-11-15 10:36:40 UTC
Hello Jason, thank you for your answer!

I have tried to rebuild jconcpp, but it doens't change anything :(

I am on `gcc (Gentoo 9.2.0-r2 p3) 9.2.0` but I have tried downgrading to `gcc (Gentoo 8.3.0-r3 p3) 8.3.0`, but `emerge tensorflow` still gives the same error.
Comment 5 Hendrik Klug 2019-11-15 14:09:17 UTC
(In reply to Hendrik Klug from comment #4)
> Hello Jason, thank you for your answer!
> 
> I have tried to rebuild jsoncpp, but it doens't change anything :(
> 
> I am on `gcc (Gentoo 9.2.0-r2 p3) 9.2.0` but I have tried downgrading to
> `gcc (Gentoo 8.3.0-r3 p3) 8.3.0`, but `emerge tensorflow` still gives the
> same error.

I built jsoncpp again with the gcc 8.3.0 compiler, and now I'm able to build tensorflow with the cuda USE flag on :)

Thank you for your help!
Comment 6 Jason Zaman gentoo-dev 2019-11-16 01:05:32 UTC
Great to hear!
Comment 7 Soren Harward 2020-05-08 15:57:07 UTC
I had the same problem with both jsoncpp and flatbuffers, which had been compiled with 9.2.0.  This isn't just a one-off weirdness that went away for one specific user.  This will *always* be a problem when a user has their system GCC >8.3.0, and tries to build tensorflow with CUDA support.

The nvidia cuda compiler (nvcc) in nvidia-cuda-toolkit-10.2 is limited to gcc versions <=8.3.  The tensorflow build process automagically uses this gcc version, regardless of what the system's default gcc version is.  If the user has a system GCC >8.3, then system libraries like jsoncpp and flatbuffers will be built against newer versions of libstdc++.so.  When nvcc tries to link tensorflow to those system libraries, it brings in the version of libstdc++ from gcc-8.3.0, which is older than the system libraries were linked to.  That version mismatch causes the "undefined reference" errors pointed out comment #3.

The only way to fix this is to ensure that all C++ system libraries that tensorflow links to are built with the same version of GCC that nvcc uses.  Thus far, jsoncpp and flatbuffers seem to be the only two libraries that cause problems.  But I won't be surprised if more show up in the future.  I finally just package.mask'ed >=sys-devel/gcc-8.4.0 on my system so that I can build tensorflow without having to worry about which system libraries will cause link failures.

Unless there's some kind of portage function that can detect which version of GCC another package was compiled with, I don't see a feasible way to fix this compile problem in the tensorflow ebuild.  The best we might be able to do is detect if the system gcc is >8.3, and if so, output a warning when  USE="cuda" that suggests that the user downgrade to gcc-8.3.0 and rebuild any packages providing libraries that cause tensorflow link failures.

Then again, the ebuild warning may be overkill, because the gcc version used by nvcc will change with different releases of nvidia-cuda-toolkit.  So maybe the best we can hope for is that the very few Gentoo+Tensorflow+CUDA users will find this bug report when their build fails, and know how to fix it.