The nvfuser.so, functorch.so and libnvfuser_codegen.so are not properly installed in ${ED}: tmp └── portage └── sci-libs └── caffe2-2.0.1-r4 └── work └── pytorch-2.0.1 ├── functorch │ └── functorch.so ├── nvfuser │ └── nvfuser.so ├── third_party │ └── nvfuser └── torch └── lib └── libnvfuser_codegen.so , which violates the FHS, and cause "installation outside prefix" error on Gentoo prefix systems. Also, missing nvfuser.so causes sci-libs/pytorch fails to build: 2023-09-23 21:28:37,453 root INFO building 'nvfuser._C' extension 2023-09-23 21:28:37,454 root INFO x86_64-pc-linux-gnu-gcc -shared -fuse-ld=gold -O2 -pipe -march=znver2 -DNDEBUG -L/opt/gentoo/usr/lib64 -o /tmp/portage/sci-libs/pytorch-2.0.1-r1/work/pytorch-2.0.1_python3.11/build/lib.linux-x86_64-cpython-311/nvfuser/_C.cpython-311-x86_64-linux-gnu.so x86_64-pc-linux-gnu-gcc: fatal error: no input files Because setup.py is trying to copy ${S}/torch/nvfuser/nvfuser.so, fails and fall back to compile, and cannot find source. Reproducible: Always
Created attachment 871192 [details] build.log
Created attachment 871193 [details] emerge --info
Fixed in my overlay: https://github.com/stefantalpalaru/gentoo-overlay
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=6066231bec8d7a83aff48ff16eb28e44eadd6ef4 commit 6066231bec8d7a83aff48ff16eb28e44eadd6ef4 Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2023-12-01 05:52:18 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2023-12-01 05:53:04 +0000 sci-libs/caffe2: install nvfuser and functorch files Closes: https://bugs.gentoo.org/914572 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> sci-libs/caffe2/caffe2-2.0.1-r5.ebuild | 210 +++++++++++++++++++++ sci-libs/caffe2/files/caffe2-2.0.1-cudaExtra.patch | 28 +++ 2 files changed, 238 insertions(+)
The issue with this patch is that it will additionally install an '__init__.py' file in '/usr/lib64', whereas the original installation location for this file should be '/usr/lib/python*/site-packages/nvfuser'.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=a30ecc1b69628e20faa989943ff1b0dda32d9d69 commit a30ecc1b69628e20faa989943ff1b0dda32d9d69 Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2023-12-06 19:48:46 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2023-12-06 19:49:16 +0000 sci-libs/caffe2: install nvfuser python module Bug: https://bugs.gentoo.org/914572 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> sci-libs/caffe2/caffe2-2.1.1.ebuild | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
(In reply to Jiezhe Wang from comment #5) > The issue with this patch is that it will additionally install an > '__init__.py' file in '/usr/lib64', whereas the original installation > location for this file should be '/usr/lib/python*/site-packages/nvfuser'. Can you test if that is fixed in the 2.1.1 or if not, what I need to do. pytorch 2.1.1 is not yet ready though
(In reply to Tupone Alfredo from comment #7) > (In reply to Jiezhe Wang from comment #5) > > The issue with this patch is that it will additionally install an > > '__init__.py' file in '/usr/lib64', whereas the original installation > > location for this file should be '/usr/lib/python*/site-packages/nvfuser'. > > Can you test if that is fixed in the 2.1.1 or if not, what I need to do. > pytorch 2.1.1 is not yet ready though Thank you! I will try this out in the weekend.
I confirm that the fixed caffe2-2.0.1-r5 worked out well. pytorch installs also smoothly pytorch didn't install /usr/lib64/__init__.py (or maybe I did not catch up with what you were talking about?)
However I don't know what to do with nvfuser. I need to install _C.cpython-311-x86_64-linux-gnu.so inside the nvfuser python module, but I cannot find how to build it
(In reply to Tupone Alfredo from comment #10) > However I don't know what to do with nvfuser. > I need to install _C.cpython-311-x86_64-linux-gnu.so inside the nvfuser > python module, but I cannot find how to build it The _C.so file is built as nvfuser.so with caffe2, as shown in comment #0. My temporary solution is to install nvfuser.so with caffe2, then when installing pytorch, move `nvfuser.so` and `third_party/nvfuser/python/__init__.py` to `nvfuser` directory at top-level. This should be the stage after caffe2 building. Then nvfuser would be installed automatically.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=60f42ca58ef409f2a08c17e6c62fd53b4cc5b87d commit 60f42ca58ef409f2a08c17e6c62fd53b4cc5b87d Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2023-12-23 16:04:31 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2023-12-23 16:05:09 +0000 sci-libs/caffe2: fix nvfuser python module Bug: https://bugs.gentoo.org/914572 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> sci-libs/caffe2/{caffe2-2.1.1-r5.ebuild => caffe2-2.1.1-r6.ebuild} | 1 + 1 file changed, 1 insertion(+)
I hope I didn't broke eprefix
Apparently there's some file collision to this... Not sure how to solve it though... sci-libs/caffe2-2.1.1-r6: Detected file collision(s): * * /usr/lib/python3.10/site-packages/nvfuser/__init__.py * /usr/lib/python3.10/site-packages/nvfuser/__pycache__/__init__.cpython-310.pyc * /usr/lib/python3.10/site-packages/nvfuser/__pycache__/__init__.cpython-310.opt-1.pyc * /usr/lib/python3.10/site-packages/nvfuser/__pycache__/__init__.cpython-310.opt-2.pyc portageq owners / /usr/lib/python3.10/site-packages/nvfuser/__init__.py sci-libs/pytorch-2.1.1 /usr/lib/python3.10/site-packages/nvfuser/__init__.py
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c884f73010d4d4459e9c9fb257a1d7d4467c7b70 commit c884f73010d4d4459e9c9fb257a1d7d4467c7b70 Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2023-12-24 11:47:13 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2023-12-24 11:47:48 +0000 sci-libs/pytorch: nvfuser installed in caffe2 Bug: https://bugs.gentoo.org/914572 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> sci-libs/pytorch/{pytorch-2.1.1.ebuild => pytorch-2.1.1-r1.ebuild} | 2 -- 1 file changed, 2 deletions(-)
if you still have a conflict, remove pytorch before emerging caffe2, and install after. Thanks for report. If/when everything looks ok, I'll bump to pytorch 2.1.2
seems to work fine now... tried rebuilding both pytorch and caffe2... Verifying ebuild manifests >>> Emerging (1 of 2) sci-libs/caffe2-2.1.1-r6::gentoo >>> Installing (1 of 2) sci-libs/caffe2-2.1.1-r6::gentoo >>> Recording sci-libs/caffe2 in "world" favorites file... >>> Completed (1 of 2) sci-libs/caffe2-2.1.1-r6::gentoo >>> Emerging (2 of 2) sci-libs/pytorch-2.1.1-r1::gentoo >>> Installing (2 of 2) sci-libs/pytorch-2.1.1-r1::gentoo >>> Completed (2 of 2) sci-libs/pytorch-2.1.1-r1::gentoo >>> Jobs: 2 of 2 complete Also Merry Christmas :)