Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 873055 - sci-libs/torchvision-0.11.2::science uses wrong include dir for pytorch headers, doesn't build CPU target objects but tries to link them, and picks wrong lib dir
Summary: sci-libs/torchvision-0.11.2::science uses wrong include dir for pytorch heade...
Status: UNCONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Overlays (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Science Related Packages
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-09-26 14:51 UTC by Michael Moon
Modified: 2023-04-23 14:23 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build log (torchvision-0.11.2_build.log.txt,187.97 KB, text/plain)
2022-09-26 14:51 UTC, Michael Moon
Details
caffe2_include_poc.c (caffe2_include_poc.c,253 bytes, text/x-csrc)
2022-11-21 05:36 UTC, Anton Bolshakov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Moon 2022-09-26 14:51:43 UTC
Created attachment 814246 [details]
build log

sci-libs/torchvision-0.11.2::science uses wrong include dir for pytorch headers:

Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/35] x86_64-pc-linux-gnu-g++ -MMD -MF /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.o.d -O2 -pipe -march=native -fPIC -DWITH_CUDA -I/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc -I/usr/lib/python3.10/site-packages/torch/include -I/usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.10/site-packages/torch/include/TH -I/usr/lib/python3.10/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.10 -c -c /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.cpp -o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
FAILED: /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.o
x86_64-pc-linux-gnu-g++ -MMD -MF /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.o.d -O2 -pipe -march=native -fPIC -DWITH_CUDA -I/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc -I/usr/lib/python3.10/site-packages/torch/include -I/usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.10/site-packages/torch/include/TH -I/usr/lib/python3.10/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.10 -c -c /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.cpp -o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/nms_kernel.cpp:4:10: fatal error: torch/types.h: No such file or directory
4 | #include <torch/types.h>
  |          ^~~~~~~~~~~~~~~
compilation terminated.
[2/35] x86_64-pc-linux-gnu-g++ -MMD -MF /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o.d -O2 -pipe -march=native -fPIC -DWITH_CUDA -I/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc -I/usr/lib/python3.10/site-packages/torch/include -I/usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.10/site-packages/torch/include/TH -I/usr/lib/python3.10/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.10 -c -c /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.cpp -o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
FAILED: /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o
x86_64-pc-linux-gnu-g++ -MMD -MF /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o.d -O2 -pipe -march=native -fPIC -DWITH_CUDA -I/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc -I/usr/lib/python3.10/site-packages/torch/include -I/usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/usr/lib/python3.10/site-packages/torch/include/TH -I/usr/lib/python3.10/site-packages/torch/include/THC -I/opt/cuda/include -I/usr/include/python3.10 -c -c /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.cpp -o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/ops/autocast/ps_roi_pool_kernel.cpp:4:10: fatal error: torch/types.h: No such file or directory
4 | #include <torch/types.h>
  |          ^~~~~~~~~~~~~~~
compilation terminated.

and several more (bugzilla is complaining about comment length).

The files seem to be provided by sci-libs/caffe2 in /usr/include/torch/csrc/api/include/torch/ while sci-libs/torchvision seems to be expecting to find them in  /usr/lib/python3.10/site-packages/torch/include/torch/csrc/api/include.

If I ln -s /usr/include /usr/lib/python3.10/site-packages/torch/, it seems to work.

https://github.com/pytorch/pytorch/issues/5964 might be relevant, although it's 4 years old and closed.

---

sci-libs/torchvision-0.11.2::science doesn't build CPU target objects but tries to link them, and picks wrong lib dir:

/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libavutil.so when searching for -lavutil
x86_64-pc-linux-gnu-g++ -shared -Wl,-O1 -Wl,--as-needed -O2 -pipe -march=native /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/common_jpeg.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_image.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_jpeg.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_png.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/encode_jpeg.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/encode_png.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/read_write_file.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cuda/decode_jpeg_cuda.o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/image.o -L/usr/lib64 -L/usr/lib/python3.10/site-packages/torch/lib -L/opt/cuda/lib64 -L/usr/lib64 -lpng -ljpeg -lnvjpeg -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/lib/torchvision/image.so
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libswresample.so when searching for -lswresample
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libswscale.so when searching for -lswscale
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/common_jpeg.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_image.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_jpeg.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/decode_png.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/encode_jpeg.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/encode_png.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cpu/read_write_file.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/cuda/decode_jpeg_cuda.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: cannot find /var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2-python3_10/temp.linux-x86_64-3.10/var/tmp/portage/sci-libs/torchvision-0.11.2/work/vision-0.11.2/torchvision/csrc/io/image/image.o: No such file or directory
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libm.so when searching for -lm
/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: skipping incompatible /usr/lib/libm.a when searching for -lm
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
error: command '/usr/bin/x86_64-pc-linux-gnu-g++' failed with exit code 1

which looks like 1) it's simply choosing to not build anything for CPU target (check attached build log) then barfing on the absence of these objects, and 2) is looking in the wrong lib dir for libraries (/usr/lib instead of /usr/lib64)

---

I can't find any issues even vaguely similar to this elsewhere, so I'm guessing it's somehow a Gentoo thing even though the ebuild looks super basic (perhaps distutils-r1 eclass is confused?) - happy to report upstream if you think that's more appropriate though.

I tried bumping the version to 0.13.1 locally (latest upstream release), but it did the exact same thing.
Comment 1 Anton Bolshakov 2022-11-21 05:34:47 UTC
I believe torchvision has been renamed:
  sci-libs/torchvision -> sci-libs/caffe2


but the problem is still there.

There are two sets of include files:
/usr/include/torch
 and
/usr/include/torch/csrc/api/include/torch/

Files in the second directory (such as all.h), call
  #include <torch/cuda.h>

All files from the second directory should be moved to the first directory.

It may be something to do with the following code:
/torch/utils/cpp_extension.py
    if not is_standalone:
        common_cflags.append(f'-DTORCH_EXTENSION_NAME={name}')
        common_cflags.append('-DTORCH_API_INCLUDE_EXTENSION_H')
Comment 2 Anton Bolshakov 2022-11-21 05:36:39 UTC
Created attachment 835357 [details]
caffe2_include_poc.c

try to compile with "gcc caffe2_include_poc.c"
Comment 3 Ștefan Talpalaru 2023-04-05 01:19:36 UTC
Please try torchvision-0.15.1 from my overlay: https://github.com/stefantalpalaru/gentoo-overlay