Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 904292 - dev-libs/cutlass-2.10.0: emerge fails due to sandbox violation (cuda tries to write to /dev/char*)
Summary: dev-libs/cutlass-2.10.0: emerge fails due to sandbox violation (cuda tries to...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Tupone Alfredo
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-14 00:44 UTC by glyphimor
Modified: 2023-06-04 13:36 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
emerge --info (file_904292.txt,6.59 KB, text/plain)
2023-04-14 00:44 UTC, glyphimor
Details
build.log (build.log,5.02 KB, text/plain)
2023-04-14 00:47 UTC, glyphimor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description glyphimor 2023-04-14 00:44:29 UTC
Created attachment 860035 [details]
emerge --info

Attempted to emerge dev-libs/cutlass-2.10.0 for the 1st time, but it failed due to a sandbox violation during source configuration.

cutlass is a dependency of sci-libs/caffe2 (and thus sci-libs/pytorch) when the cuda use flag is set.

In reproducing, note that emerging cutlass fails w/ an error if your gcc version is > 11.3.

The relevant section of emerge output:

[...]

-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
 * ACCESS DENIED:  remove:        /dev/char/195:255
 * ACCESS DENIED:  symlink:       /dev/char/195:255
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done

[...]

-- Build files have been written to: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build
>>> Source configured.
 * ----------------------- SANDBOX ACCESS VIOLATION SUMMARY -----------------------
 * LOG FILE: "/var/tmp/portage/dev-libs/cutlass-2.10.0/temp/sandbox.log"
 * 
VERSION 1.0
FORMAT: F - Function called
FORMAT: S - Access Status
FORMAT: P - Path as passed to function
FORMAT: A - Absolute Path (not canonical)
FORMAT: R - Canonical Path
FORMAT: C - Command Line

F: remove
S: deny
P: /dev/char/195:255
A: /dev/char/195:255
R: /dev/char/195:255
C: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin

F: symlink
S: deny
P: /dev/char/195:255
A: /dev/char/195:255
R: /dev/char/195:255
C: /var/tmp/portage/dev-libs/cutlass-2.10.0/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin
~
Comment 1 glyphimor 2023-04-14 00:47:44 UTC
Created attachment 860036 [details]
build.log
Comment 2 Tupone Alfredo gentoo-dev 2023-04-14 13:38:47 UTC
can you tell me what device is /dev/char/195:255 in your machine ?
Comment 3 glyphimor 2023-04-14 15:14:47 UTC
/dev/char/195:255 points to /dev/nvidiactl
Comment 4 Tupone Alfredo gentoo-dev 2023-04-14 21:05:42 UTC
Then I think I have fixed this 

Author: Alfredo Tupone <tupone@gentoo.org>
Date:   Wed Apr 12 08:17:30 2023 +0200

    dev-libs/cutlass: fix /dev/nvidiactl sandbox issues

can you emerge --sync and then rebuild
Comment 5 glyphimor 2023-04-15 04:55:24 UTC
Bug is still present. Ran emerge --sync and then tried to emerge cutlass. A diff on the old and new build.log files shows they're identical.

My emerge --sync output didn't mention cutlass. I tried to look at the commit on cutlass' page on packages.gentoo.org and got a 503 error (https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=ebe60e073ff38f2a8ecf71466baea240f675581a). Perhaps unrelated.
Comment 6 Tupone Alfredo gentoo-dev 2023-04-15 17:41:40 UTC
I tried a lil harder.

Try to resync in a couple of hours, rebuild and let me know.
Comment 7 glyphimor 2023-04-16 17:16:57 UTC
Did get the new version (2.10.0-r1), but bug is still present. build.log is the same as before (besides the version number).
Comment 8 Anton Bolshakov 2023-04-18 08:10:22 UTC
It would be good to have a quick fix here since there is no alternative ebuild.
FYI, I tried to recover the previous ebuild, but it didn't help. So someone must have changed an eclass (cuda?)
Comment 9 Tupone Alfredo gentoo-dev 2023-04-20 16:51:07 UTC
(In reply to Anton Bolshakov from comment #8)
> It would be good to have a quick fix here since there is no alternative
> ebuild.
> FYI, I tried to recover the previous ebuild, but it didn't help. So someone
> must have changed an eclass (cuda?)

Until I found a fix for that (I don't have nvidia boards) you can try

FEATURES="-sandbox" emerge cutlass
Comment 10 Cyan Garamonde 2023-04-22 03:28:13 UTC
Just adding that I'm experiencing the same issue with a 3070 Ti, nvidia-cuda-toolkit-11.8.0-r3, nvidia-drivers-525.105.17

Even setting FEATURES="-sandbox" the result is the same:

-- Check for working CUDA compiler: /opt/cuda/bin/nvcc - skipped
 * ACCESS DENIED:  remove:        /dev/char/195:255
 * ACCESS DENIED:  symlink:       /dev/char/195:255
-- Detecting CUDA compile features

Interestingly when I remove /dev/char/195:255 myself manually before the emerge, it comes back after the emerge (even though the emerge fails and doesn't make it out of the configuration stage).

My best guess is some nvidia utility is playing games here?
Comment 11 Cyan Garamonde 2023-04-22 03:43:35 UTC
Adding this strace output from /var/tmp/portage/dev-libs/cutlass-2.10.0-r1/work/cutlass-2.10.0_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin

stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0
stat("/dev/nvidiactl", {st_mode=S_IFCHR|0660, st_rdev=makedev(0xc3, 0xff), ...}) = 0
unlink("/dev/char/195:255")             = 0
symlink("../nvidiactl", "/dev/char/195:255") = 0

So I guess something in cmake is trying to unlink() and symlink() after all.
Comment 12 Anton Bolshakov 2023-04-22 14:23:05 UTC
FYI, caffe2 (and may be other cuda/nvidia related packages) having the same problem:

 * ----------------------- SANDBOX ACCESS VIOLATION SUMMARY -----------------------
 * LOG FILE: "/data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/temp/sandbox.log"
 * 
VERSION 1.0
FORMAT: F - Function called
FORMAT: S - Access Status
FORMAT: P - Path as passed to function
FORMAT: A - Absolute Path (not canonical)
FORMAT: R - Canonical Path
FORMAT: C - Command Line

F: remove
S: deny
P: /dev/char/195:255
A: /dev/char/195:255
R: /dev/char/195:255
C: /data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/work/pytorch-1.13.1_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin 

F: symlink
S: deny
P: /dev/char/195:255
A: /dev/char/195:255
R: /dev/char/195:255
C: /data/notmpfs/portage/sci-libs/caffe2-1.13.1-r4/work/pytorch-1.13.1_build/CMakeFiles/3.25.3/CMakeDetermineCompilerABI_CUDA.bin
Comment 13 Tupone Alfredo gentoo-dev 2023-04-22 19:21:20 UTC
could someone try to put on cutlass :

addpredict /dev/char

just below the 

cuda_add_sandbox -w

and report if that is working.

I'm sorry I cannot test it
Comment 14 Cyan Garamonde 2023-04-22 21:27:59 UTC
(In reply to Tupone Alfredo from comment #13)
> could someone try to put on cutlass :
> 
> addpredict /dev/char
> 
> just below the 
> 
> cuda_add_sandbox -w
> 
> and report if that is working.
> 
> I'm sorry I cannot test it

Confirmed this worked for me!
Comment 15 Larry the Git Cow gentoo-dev 2023-04-23 07:45:19 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=c227e389acf3143d5e6ded6b84637cc14b9798bf

commit c227e389acf3143d5e6ded6b84637cc14b9798bf
Author:     Alfredo Tupone <tupone@gentoo.org>
AuthorDate: 2023-04-23 07:44:26 +0000
Commit:     Alfredo Tupone <tupone@gentoo.org>
CommitDate: 2023-04-23 07:44:54 +0000

    dev-libs/cutlass: fix sandbox
    
    Closes: https://bugs.gentoo.org/904292
    Signed-off-by: Alfredo Tupone <tupone@gentoo.org>

 dev-libs/cutlass/cutlass-2.10.0-r1.ebuild | 1 +
 1 file changed, 1 insertion(+)
Comment 16 Larry the Git Cow gentoo-dev 2023-06-04 13:36:16 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=9c1e2bb5e15c833363367382e9f1c44b9eeae0a0

commit 9c1e2bb5e15c833363367382e9f1c44b9eeae0a0
Author:     Ionen Wolkens <ionen@gentoo.org>
AuthorDate: 2023-06-04 10:47:05 +0000
Commit:     Ionen Wolkens <ionen@gentoo.org>
CommitDate: 2023-06-04 13:35:43 +0000

    x11-drivers/nvidia-drivers: use sandbox.d for /dev/nvidiactl+/dev/char
    
    /dev/nvidiactl been a long standing issue, sometime appearing in sneaky
    ways when a revdeps is built with opencl/cuda support even though the
    package itself does not use it.
    
    And /dev/char is newly needed with >=nvidia-drivers-525.105.17 or
    >=535.43.02, but not 530.41.03. The production branch's 525.105.17
    is newer than ~arch's long-living 530 and led to this being overlooked
    until it hit stable (older stable 525.89.02 was not affected) and
    was unaware of this until rebuilt libomp[offload] with 535 today
    (note that 535.43.02 is unkeyworded, it's a beta).
    
    Need /dev/char rather than /dev/char/195:255 given it tries to remove
    + create a symlink and does not simply try to write there.
    
    This is not meant to be a full coverage of nvidia devices and only
    for those being a widespread problem. Special needs or addwrite
    (typically to run tests) should be handled manually or using
    cuda.eclass' cuda_add_sandbox.
    
    Adding /dev/char to all versions even if not needed *yet* just so it's
    not overlooked when nvidia spreads it to other branches (except 390
    given it's EOL, not to mention has no cuda packages anymore).
    
    Bug: https://bugs.gentoo.org/904292
    Bug: https://bugs.gentoo.org/905436
    Closes: https://bugs.gentoo.org/904944
    Signed-off-by: Ionen Wolkens <ionen@gentoo.org>

 x11-drivers/nvidia-drivers/nvidia-drivers-390.157.ebuild           | 7 +++++++
 ...ivers-470.182.03.ebuild => nvidia-drivers-470.182.03-r1.ebuild} | 7 +++++++
 ...ivers-515.105.01.ebuild => nvidia-drivers-515.105.01-r1.ebuild} | 7 +++++++
 ...ivers-525.116.04.ebuild => nvidia-drivers-525.116.04-r1.ebuild} | 7 +++++++
 ...drivers-525.47.26.ebuild => nvidia-drivers-525.47.26-r1.ebuild} | 7 +++++++
 ...drivers-530.41.03.ebuild => nvidia-drivers-530.41.03-r1.ebuild} | 7 +++++++
 ...drivers-535.43.02.ebuild => nvidia-drivers-535.43.02-r1.ebuild} | 7 +++++++
 7 files changed, 49 insertions(+)