906094 – sci-libs/pytorch-2.0.0

Bug 906094 - sci-libs/pytorch-2.0.0

Summary: sci-libs/pytorch-2.0.0

Status:	RESOLVED FIXED

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	All Linux

Importance:	Normal enhancement (vote)
Assignee:	Tupone Alfredo

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2023-05-10 21:29 UTC by Andrew Cameron
Modified:	2023-06-05 21:22 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Caffe2 Build log with opencl enabled (caffe2_build.log.gz,233.88 KB, application/gzip) 2023-05-14 14:08 UTC, Andrew Cameron	Details
pytorch build log (pytorch_build.log.gz,120.17 KB, application/gzip) 2023-05-14 14:09 UTC, Andrew Cameron	Details
/var/lib/caffe2/CmakeCache.txt (CMakeCache.txt,45.88 KB, text/plain) 2023-05-14 14:10 UTC, Andrew Cameron	Details
Sample pytorch run showing opencl was not included in the build (pytorch_run.log,609 bytes, text/x-log) 2023-05-14 14:11 UTC, Andrew Cameron	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrew Cameron 2023-05-10 21:29:20 UTC

The current ebuild for sci-libs/pytorch-2.0.0 does not offer all the build functionality that the ebuild for sci-libs/caffe2-2.0.0-r3 does or that can be enabled if built directly from source code.

Can the ebuild be updated to provide the ability to enable the same options as if it were built directly from the Source Code?
This would enable it to honor our use flags and emable the different backends such as opencl, vulkan, opencv etc.

Comment 1 Tupone Alfredo gentoo-dev

2023-05-11 16:30:03 UTC

The pytorch and caffe2 configuration are strictly related: caffe2 save the configuration that pytorch is using. So, apart from error, the functionality of caffe2 ebuild are the same of pytorch. If you find some that are not, please raise a bug or, better, provide patches.

For the things that are not enabled, please provide patch or a bug report telling me what dependencies they need. As I don't have most of the backends, a patch / pull request of a tested ebuild is good for me.

Comment 2 Andrew Cameron 2023-05-11 18:11:21 UTC

The caffe2 ebuild has the following listed below as it builds from the C code using cmake whereas the pytorch has none of these options enabled in the ebuild and does not seem to use cmake.
----
	local mycmakeargs=(
		-DBUILD_CUSTOM_PROTOBUF=OFF
		-DBUILD_SHARED_LIBS=ON

		-DUSE_CCACHE=OFF
		-DUSE_CUDA=$(usex cuda)
		-DUSE_CUDNN=$(usex cuda)
		-DUSE_FAST_NVCC=$(usex cuda)
		-DTORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST:-3.5 7.0}"
		-DBUILD_NVFUSER=OFF
		-DUSE_DISTRIBUTED=$(usex distributed)
		-DUSE_MPI=$(usex mpi)
		-DUSE_FAKELOWP=OFF
		-DUSE_FBGEMM=$(usex fbgemm)
		-DUSE_FFMPEG=$(usex ffmpeg)
		-DUSE_GFLAGS=ON
		-DUSE_GLOG=ON
		-DUSE_GLOO=$(usex gloo)
		-DUSE_KINETO=OFF # TODO
		-DUSE_LEVELDB=OFF
		-DUSE_MAGMA=OFF # TODO: In GURU as sci-libs/magma
		-DUSE_MKLDNN=OFF
		-DUSE_NCCL=OFF # TODO: NVIDIA Collective Communication Library
		-DUSE_NNPACK=$(usex nnpack)
		-DUSE_QNNPACK=$(usex qnnpack)
		-DUSE_XNNPACK=$(usex xnnpack)
		-DUSE_SYSTEM_XNNPACK=$(usex xnnpack)
		-DUSE_TENSORPIPE=$(usex tensorpipe)
		-DUSE_PYTORCH_QNNPACK=OFF
		-DUSE_NUMPY=$(usex numpy)
		-DUSE_OPENCL=$(usex opencl)
		-DUSE_OPENCV=$(usex opencv)
		-DUSE_OPENMP=$(usex openmp)
		-DUSE_ROCM=OFF # TODO
		-DUSE_SYSTEM_CPUINFO=ON
		-DUSE_SYSTEM_PYBIND11=ON
		-DUSE_UCC=OFF
		-DUSE_VALGRIND=OFF
		-DPYBIND11_PYTHON_VERSION="${EPYTHON#python}"
		-DPYTHON_EXECUTABLE="${PYTHON}"
		-DUSE_ITT=OFF
		-DBLAS=Eigen # avoid the use of MKL, if found on the system
		-DUSE_SYSTEM_EIGEN_INSTALL=ON
		-DUSE_SYSTEM_PTHREADPOOL=ON
		-DUSE_SYSTEM_FXDIV=ON
		-DUSE_SYSTEM_FP16=ON
		-DUSE_SYSTEM_GLOO=ON
		-DUSE_SYSTEM_ONNX=ON
		-DUSE_SYSTEM_SLEEF=ON

Comment 3 Tupone Alfredo gentoo-dev

2023-05-11 18:39:28 UTC

If you look at both the ebuild I save a /var/lib/caffe2/CMakeCache.txt and reuse it in pytorch.

The intention for this is to share the setting

When I did I thought this could work, at least with 1.12. Maybe this is not working anymore for 2.00 ?

Comment 4 Andrew Cameron 2023-05-12 22:45:50 UTC

Can you check that it is actually using the settings from /var/lib/caffe2/CMakeCache.txt as settings that are enabled for caffe2 do not seem to be enabled in pytorch once I build it.

Comment 5 Tupone Alfredo gentoo-dev

2023-05-13 17:52:04 UTC

(In reply to Andrew Cameron from comment #4)
> Can you check that it is actually using the settings from
> /var/lib/caffe2/CMakeCache.txt as settings that are enabled for caffe2 do
> not seem to be enabled in pytorch once I build it.

Please provide the build log of both caffe2 and pytorch.

Build pytorch after any change of USE for caffe2

Comment 6 Andrew Cameron 2023-05-14 14:08:53 UTC

Created attachment 861667 [details]
Caffe2 Build log with opencl enabled

Comment 7 Andrew Cameron 2023-05-14 14:09:52 UTC

Created attachment 861668 [details]
pytorch build log

The pytorch build seems to ignore the settings from caffe2

Comment 8 Andrew Cameron 2023-05-14 14:10:32 UTC

Created attachment 861669 [details]
/var/lib/caffe2/CmakeCache.txt

Comment 9 Andrew Cameron 2023-05-14 14:11:08 UTC

Created attachment 861670 [details]
Sample pytorch run showing opencl was not included in the build

Comment 10 Andrew Cameron 2023-05-14 14:13:13 UTC

#python
Python 3.11.3 (main, May  5 2023, 10:18:46) [GCC 12.2.1 20230304] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t2=torch.randn(1,10).to('opencl')
Error in cpuinfo: processor architecture is not supported in cpuinfo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: PyTorch is not linked with support for opencl devices
>>>

Comment 11 Tupone Alfredo gentoo-dev

2023-05-14 20:27:30 UTC

from a quick grep in the source, it seems that opencl is not available. I don't see a device registration for the openCL. The only registration I see are for:

CUDA, Metal, Vulkan, CPU, Meta, MPS, Lazy (some I don't know what are )

And from the web

https://discuss.pytorch.org/t/runtimeerror-pytorch-is-not-linked-with-support-for-opencl-devices/164999

it seems the same.

I added support in the ebuild, as I found the options in the CMakeList but, if that is confirmed, I should drop it

Comment 12 Andrew Cameron 2023-05-14 21:12:55 UTC

You can see the list defined here
https://pytorch.org/cppdocs/api/program_listing_file_c10_core_DeviceType.h.html

torch itself lists these as valid backends in the current version
RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string:

try 
python
import torch
t2=torch.randn(1,10).to('cpu') 
print(t2)

Each of the devices listed above can be linked in to pytorch.


Here is an ebuild that includes some of the options
https://github.com/aclex/pytorch-ebuild/blob/master/sci-libs/pytorch/pytorch-1.13.1.ebuild

On mine I want to enable the opengl opencl and vulkan backends but none of them work

EG
These should all work
python 
Python 3.11.3 (main, May  5 2023, 10:18:46) [GCC 12.2.1 20230304] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> t2=torch.randn(1,10).to('cpu')
Error in cpuinfo: processor architecture is not supported in cpuinfo
>>> print(t2)
tensor([[-0.4604,  2.6746,  0.7875,  1.1237,  0.8364,  1.4360,  0.8594, -1.1803,
          0.6411,  0.0886]])
>>> t2=torch.randn(1,10).to('opengl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: PyTorch is not linked with support for opengl devices
>>> t2=torch.randn(1,10).to('opencl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: PyTorch is not linked with support for opencl devices
>>> t2=torch.randn(1,10).to('vulkan')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: PyTorch is not linked with support for vulkan devices
>>>

Comment 13 Andrew Cameron 2023-05-14 22:46:16 UTC

I am emerging your updated versions.
I will let you know tomorrow if it makes a difference.

Comment 14 Tupone Alfredo gentoo-dev

2023-05-15 06:34:39 UTC

(In reply to Andrew Cameron from comment #12)
> You can see the list defined here
> https://pytorch.org/cppdocs/api/program_listing_file_c10_core_DeviceType.h.
> html
> 
> torch itself lists these as valid backends in the current version
> RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl,
> ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia,
> privateuseone device type at start of device string:
> 
> try 
> python
> import torch
> t2=torch.randn(1,10).to('cpu') 
> print(t2)
> 
> Each of the devices listed above can be linked in to pytorch.
> 
> 
> Here is an ebuild that includes some of the options
> https://github.com/aclex/pytorch-ebuild/blob/master/sci-libs/pytorch/pytorch-
> 1.13.1.ebuild
> 
> On mine I want to enable the opengl opencl and vulkan backends but none of
> them work
> 
> EG
> These should all work
> python 
> Python 3.11.3 (main, May  5 2023, 10:18:46) [GCC 12.2.1 20230304] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import torch
> >>> t2=torch.randn(1,10).to('cpu')
> Error in cpuinfo: processor architecture is not supported in cpuinfo
> >>> print(t2)
> tensor([[-0.4604,  2.6746,  0.7875,  1.1237,  0.8364,  1.4360,  0.8594,
> -1.1803,
>           0.6411,  0.0886]])
> >>> t2=torch.randn(1,10).to('opengl')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> RuntimeError: PyTorch is not linked with support for opengl devices
> >>> t2=torch.randn(1,10).to('opencl')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> RuntimeError: PyTorch is not linked with support for opencl devices
> >>> t2=torch.randn(1,10).to('vulkan')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> RuntimeError: PyTorch is not linked with support for vulkan devices
> >>>

I understand, but looking at the error, its source is libtorch_cpu.so (in caffe2), and a grep of C10_REGISTER_GUARD_IMPL, that seems the registration procedure of the backend, you don't have opencl.

Whatever is,  being the backtrace in caffe2, I don't think the problem is in the python side.

Or opencl is not ready, or I was not able to support in caffe2

Comment 15 Tupone Alfredo gentoo-dev

2023-05-15 18:32:10 UTC

> On mine I want to enable the opengl opencl and vulkan backends but none of

vulkan seems available only for android
opencl seems only a starting point for development
opengl is not selectable by cmake

Comment 16 Andrew Cameron 2023-05-15 21:05:57 UTC

I have created the following question on the pytorch discussions to see what they say
https://discuss.pytorch.org/t/compile-pytorch-2-0-with-backend-support-for-opencl-opengl-and-vulkan/179996

Comment 17 Tupone Alfredo gentoo-dev

2023-05-16 04:49:20 UTC

(In reply to Andrew Cameron from comment #16)
> I have created the following question on the pytorch discussions to see what
> they say
> https://discuss.pytorch.org/t/compile-pytorch-2-0-with-backend-support-for-
> opencl-opengl-and-vulkan/179996

thanks

Comment 18 Andrew Cameron 2023-05-28 21:44:07 UTC

Please can you add the riscv keyword to the latest ebuilds for both caffe2 and pytorch and their dependencies as you saw from my ebuild logs they do compile on my riscv system.

Once you have done that please close this BUG for now.

Comment 19 Andrew Cameron 2023-06-05 21:22:11 UTC

I am closing this as the behavior is the same when I build it from source from github.