Created attachment 860035 [details] emerge --info Attempted to emerge dev-libs/cutlass-2.10.0 for the 1st time, but it failed due to a sandbox violation during source configuration. cutlass is a dependency of sci-libs/caffe2 (and thus sci-libs/pytorch) when the cuda use flag is set. In reproducing, note that emerging cutlass fails w/ an error if your gcc version is > 11.3. The relevant section of emerge output: [...] -- The CUDA compiler identification is NVIDIA 11.8.89 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped * ACCESS DENIED: remove: /dev/char/195:255 * ACCESS DENIED: symlink: /dev/char/195:255 -- Detecting CUDA compile features -- Detecting CUDA compile features - done [...] -- Build files have been written to: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build >>> Source configured. * ----------------------- SANDBOX ACCESS VIOLATION SUMMARY ----------------------- * LOG FILE: "/var/tmp/portage/dev-libs/cutlass-2.10.0/temp/sandbox.log" * VERSION 1.0 FORMAT: F - Function called FORMAT: S - Access Status FORMAT: P - Path as passed to function FORMAT: A - Absolute Path (not canonical) FORMAT: R - Canonical Path FORMAT: C - Command Line F: remove S: deny P: /dev/char/195:255 A: /dev/char/195:255 R: /dev/char/195:255 C: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin F: symlink S: deny P: /dev/char/195:255 A: /dev/char/195:255 R: /dev/char/195:255 C: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin ~
Created attachment 860036 [details] build.log
can you tell me what device is /dev/char/195:255 in your machine ?
/dev/char/195:255 points to /dev/nvidiactl
Then I think I have fixed this Author: Alfredo Tupone <tupone@gentoo.org> Date: Wed Apr 12 08:17:30 2023 +0200 dev-libs/cutlass: fix /dev/nvidiactl sandbox issues can you emerge --sync and then rebuild
Bug is still present. Ran emerge --sync and then tried to emerge cutlass. A diff on the old and new build.log files shows they're identical. My emerge --sync output didn't mention cutlass. I tried to look at the commit on cutlass' page on packages.gentoo.org and got a 503 error (https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ebe60e073ff38f2a8ecf71466baea240f675581a). Perhaps unrelated.
I tried a lil harder. Try to resync in a couple of hours, rebuild and let me know.
Did get the new version (2.10.0-r1), but bug is still present. build.log is the same as before (besides the version number).
It would be good to have a quick fix here since there is no alternative ebuild. FYI, I tried to recover the previous ebuild, but it didn't help. So someone must have changed an eclass (cuda?)
(In reply to Anton Bolshakov from comment #8) > It would be good to have a quick fix here since there is no alternative > ebuild. > FYI, I tried to recover the previous ebuild, but it didn't help. So someone > must have changed an eclass (cuda?) Until I found a fix for that (I don't have nvidia boards) you can try FEATURES="-sandbox" emerge cutlass
Just adding that I'm experiencing the same issue with a 3070 Ti, nvidia-cuda-toolkit-11.8.0-r3, nvidia-drivers-525.105.17 Even setting FEATURES="-sandbox" the result is the same: -- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped * ACCESS DENIED: remove: /dev/char/195:255 * ACCESS DENIED: symlink: /dev/char/195:255 -- Detecting CUDA compile features Interestingly when I remove /dev/char/195:255 myself manually before the emerge, it comes back after the emerge (even though the emerge fails and doesn't make it out of the configuration stage). My best guess is some nvidia utility is playing games here?
Adding this strace output from /var/tmp/portage/dev-libs/cutlass-2.10.0-r1/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0 stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0 unlink("/dev/char/195:255") = 0 symlink("../nvidiactl", "/dev/char/195:255") = 0 So I guess something in cmake is trying to unlink() and symlink() after all.
FYI, caffe2 (and may be other cuda/nvidia related packages) having the same problem: * ----------------------- SANDBOX ACCESS VIOLATION SUMMARY ----------------------- * LOG FILE: "/data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/temp/sandbox.log" * VERSION 1.0 FORMAT: F - Function called FORMAT: S - Access Status FORMAT: P - Path as passed to function FORMAT: A - Absolute Path (not canonical) FORMAT: R - Canonical Path FORMAT: C - Command Line F: remove S: deny P: /dev/char/195:255 A: /dev/char/195:255 R: /dev/char/195:255 C: /data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/work/pytorch-1.13.1_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin F: symlink S: deny P: /dev/char/195:255 A: /dev/char/195:255 R: /dev/char/195:255 C: /data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/work/pytorch-1.13.1_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin
could someone try to put on cutlass : addpredict /dev/char just below the cuda_add_sandbox -w and report if that is working. I'm sorry I cannot test it
(In reply to Tupone Alfredo from comment #13) > could someone try to put on cutlass : > > addpredict /dev/char > > just below the > > cuda_add_sandbox -w > > and report if that is working. > > I'm sorry I cannot test it Confirmed this worked for me!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c227e389acf3143d5e6ded6b84637cc14b9798bf commit c227e389acf3143d5e6ded6b84637cc14b9798bf Author: Alfredo Tupone <tupone@gentoo.org> AuthorDate: 2023-04-23 07:44:26 +0000 Commit: Alfredo Tupone <tupone@gentoo.org> CommitDate: 2023-04-23 07:44:54 +0000 dev-libs/cutlass: fix sandbox Closes: https://bugs.gentoo.org/904292 Signed-off-by: Alfredo Tupone <tupone@gentoo.org> dev-libs/cutlass/cutlass-2.10.0-r1.ebuild | 1 + 1 file changed, 1 insertion(+)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9c1e2bb5e15c833363367382e9f1c44b9eeae0a0 commit 9c1e2bb5e15c833363367382e9f1c44b9eeae0a0 Author: Ionen Wolkens <ionen@gentoo.org> AuthorDate: 2023-06-04 10:47:05 +0000 Commit: Ionen Wolkens <ionen@gentoo.org> CommitDate: 2023-06-04 13:35:43 +0000 x11-drivers/nvidia-drivers: use sandbox.d for /dev/nvidiactl+/dev/char /dev/nvidiactl been a long standing issue, sometime appearing in sneaky ways when a revdeps is built with opencl/cuda support even though the package itself does not use it. And /dev/char is newly needed with >=nvidia-drivers-525.105.17 or >=535.43.02, but not 530.41.03. The production branch's 525.105.17 is newer than ~arch's long-living 530 and led to this being overlooked until it hit stable (older stable 525.89.02 was not affected) and was unaware of this until rebuilt libomp[offload] with 535 today (note that 535.43.02 is unkeyworded, it's a beta). Need /dev/char rather than /dev/char/195:255 given it tries to remove + create a symlink and does not simply try to write there. This is not meant to be a full coverage of nvidia devices and only for those being a widespread problem. Special needs or addwrite (typically to run tests) should be handled manually or using cuda.eclass' cuda_add_sandbox. Adding /dev/char to all versions even if not needed *yet* just so it's not overlooked when nvidia spreads it to other branches (except 390 given it's EOL, not to mention has no cuda packages anymore). Bug: https://bugs.gentoo.org/904292 Bug: https://bugs.gentoo.org/905436 Closes: https://bugs.gentoo.org/904944 Signed-off-by: Ionen Wolkens <ionen@gentoo.org> x11-drivers/nvidia-drivers/nvidia-drivers-390.157.ebuild | 7 +++++++ ...ivers-470.182.03.ebuild => nvidia-drivers-470.182.03-r1.ebuild} | 7 +++++++ ...ivers-515.105.01.ebuild => nvidia-drivers-515.105.01-r1.ebuild} | 7 +++++++ ...ivers-525.116.04.ebuild => nvidia-drivers-525.116.04-r1.ebuild} | 7 +++++++ ...drivers-525.47.26.ebuild => nvidia-drivers-525.47.26-r1.ebuild} | 7 +++++++ ...drivers-530.41.03.ebuild => nvidia-drivers-530.41.03-r1.ebuild} | 7 +++++++ ...drivers-535.43.02.ebuild => nvidia-drivers-535.43.02-r1.ebuild} | 7 +++++++ 7 files changed, 49 insertions(+)