To get direct rendering working from within a systemd-nspawn container I need to create the /dev/nvidia0 node. As expected, this doesn't work: systemd-nspawn --directory=container/debian-jessie mknod /dev/nvidia0 c 195 0 mknod: '/dev/nvidia0': Operation not permitted According to the man page, --capability grants one or more additional capabilities to the container. But this doesn't work either: systemd-nspawn --capability=CAP_MKNOD --directory=container/debian-jessie mknod /dev/nvidia0 c 195 0 mknod: '/dev/nvidia0': Operation not permitted On the other hand, adding --register=no does allow me to create nodes although the man page only mentions that this option controls whether the container is registered with systemd-machined. systemd-nspawn --register=no --directory=container/debian-jessie mknod /dev/nvidia0 c 195 0 ls /dev/nvidia0 /dev/nvidia0 I'm confused. Why does the register option change the capabilities and why is the capability option not working? Is this a bug in systemd or a misconfiguration on my end? See http://www.freedesktop.org/software/systemd/man/systemd-nspawn.html --capability= List one or more additional capabilities to grant the container. Takes a comma-separated list of capability names, see capabilities(7) for more information. Note that the following capabilities will be granted in any way: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_IPC_OWNER, CAP_KILL, CAP_LEASE, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW, CAP_SETGID, CAP_SETFCAP, CAP_SETPCAP, CAP_SETUID, CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_SYS_NICE, CAP_SYS_PTRACE, CAP_SYS_TTY_CONFIG, CAP_SYS_RESOURCE, CAP_SYS_BOOT, CAP_AUDIT_WRITE, CAP_AUDIT_CONTROL. Also CAP_NET_ADMIN is retained if --private-network is specified. If the special value "all" is passed, all capabilities are retained. --register= Controls whether the container is registered with systemd-machined(8). Takes a boolean argument, defaults to "yes". This option should be enabled when the container runs a full Operating System (more specifically: an init system), and is useful to ensure that the container is accessible via machinectl(1) and shown by tools such as ps(1). If the container does not run an init system, it is recommended to set this option to "no". Note that --share-system implies --register=no. sys-apps/systemd-215-r3 was built with the following: USE="acl cryptsetup filecaps firmware-loader gudev introspection kmod pam policykit seccomp -audit -doc -elfutils -gcrypt -http (-kdbus) -lzma -python -qrcode (-selinux) (-ssl) -test -vanilla" ABI_X86="64 -32 -x32" PYTHON_SINGLE_TARGET="python2_7 -python3_2 -python3_3" PYTHON_TARGETS="python2_7 python3_3 -python3_2"
Probably this is due to CapabilityBoundingSet: $ grep ^CapabilityBoundingSet /usr/lib/systemd/system/systemd-machined.service CapabilityBoundingSet=CAP_KILL CAP_SYS_PTRACE CAP_SYS_ADMIN CAP_SETGID CAP_SYS_CHROOT
I'm afraid I've got bad news for you: <poetteri1g> mgorny: we use the "devices" cgroup controller to make sure that containers cannot create random device nodes <poetteri1g> mgorny: and what the guy is doing cannot work <poetteri1g> mgorny: device enumeration, /sys, udev, all that stuff is not virtualized in containers <poetteri1g> mgorny: and is unlikely to ever be <poetteri1g> mgorny: which means passing devices to containers cannot really work <poetteri1g> mgorny: lxc makes weird claims that it could work <poetteri1g> mgorny: but they just don't know what they are doing...
Indeed, this is set in the devices cgroup controller systemctl status machine-debian\\x2djessie.scope ● machine-debian\x2djessie.scope - Container debian-jessie Loaded: loaded (/run/systemd/system/machine-debian\x2djessie.scope; static) Drop-In: /run/systemd/system/machine-debian\x2djessie.scope.d └─50-Description.conf, 50-DeviceAllow.conf, 50-DevicePolicy.conf, 50-Slice.conf Active: active (running) since Don 2014-07-31 09:36:01 CEST; 3min 15s ago CGroup: /machine.slice/machine-debian\x2djessie.scope cat /run/systemd/system/machine-debian\\x2djessie.scope.d/50-DeviceAllow.conf [Scope] DeviceAllow= DeviceAllow=char-kdbus/* rw DeviceAllow=char-kdbus rw DeviceAllow=char-pts rw DeviceAllow=/dev/pts/ptmx rw DeviceAllow=/dev/tty rwm DeviceAllow=/dev/urandom rwm DeviceAllow=/dev/random rwm DeviceAllow=/dev/full rwm DeviceAllow=/dev/zero rwm DeviceAllow=/dev/null rwm I'm able to create the device nodes after changing the cgroup settings. systemctl set-property --runtime machine-debian\\x2djessie.scope DeviceAllow=/dev/nvidia0 systemctl set-property --runtime machine-debian\\x2djessie.scope DeviceAllow=/dev/nvidiactl cat /run/systemd/system/machine-debian\\x2djessie.scope.d/50-DeviceAllow.conf [Scope] DeviceAllow= DeviceAllow=/dev/nvidiactl rwm DeviceAllow=/dev/nvidia0 rwm DeviceAllow=/dev/null rwm DeviceAllow=/dev/zero rwm DeviceAllow=/dev/full rwm DeviceAllow=/dev/random rwm DeviceAllow=/dev/urandom rwm DeviceAllow=/dev/tty rwm DeviceAllow=/dev/pts/ptmx rw DeviceAllow=char-pts rw DeviceAllow=char-kdbus rw DeviceAllow=char-kdbus/* rw Thanks for asking upstream and for clarification.
I hope you don't mind me closing this as WONTFIX since upstream is not willing to change that. In case there's anything else we can do, please let us know.