Suggest the following addition to the lxd startup stript. --- /etc/init.d/lxd 2020-10-18 22:01:00.857748741 +0200 +++ /etc/init.d/._cfg0000_lxd 2020-10-20 12:58:31.757567010 +0200 @@ -17,6 +17,7 @@ ebegin "Starting lxd service" modprobe -f loop > /dev/null 2>&1 + [ -d /sys/fs/cgroup/systemd ] || ( mkdir -p /sys/fs/cgroup/systemd ; mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd ) # fix permissions on /var/lib/lxd and make sure it exists install -d /var/lib/lxd --group lxd --owner root --mode 0775 This addendum follows the gentoo wiki page on lxd. Reproducible: Always Steps to Reproduce: 1.install lxd on a system without systemd. 2. 3. Actual Results: non starting containers Expected Results: Starting containers
I'm running lxd without systemd happily. Do you have all the required kernel options satisfied? What does 'lxc-checkconfig' say?
lxc != lxd..... I am using the lxd environment (which confusingly does use the lxc command to manage it, not the lxc-* ). My guest / container would not start. /etc/init.d/lxd[5947]: Call to flock failed: Resource temporarily unavailable /etc/init.d/lxd[5947]: ERROR: lxd stopped by something else /var/log/lxd/VM/lxc.log: lxc VM 20201020182546.105 ERROR utils - utils.c:lxc_rm_rf:1759 - No such file or directory - Failed to open dir "/sys/fs/cgroup/openrc//lxc.payload.VM-6" lxc VM 20201020182546.106 WARN cgfsng - cgroups/cgfsng.c:cgroup_tree_remove:965 - Failed to destroy "/sys/fs/cgroup/openrc//lxc.payload.VM-6" Where the file /sys/fs/cgroup/openrc//lxc.payload.VM-6 does exist I based it on https://wiki.gentoo.org/wiki/LXD Section 5.2: Running systemd based containers on OpenRC hosts To support systemd for e.g. ubuntu containers the host must be modified: Create the system cgroup directory and mount the cgroup there: root #mkdir -p /sys/fs/cgroup/systemd root #mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd And indeed i need to run an ubuntu 18.04 for this VM instance And don't want to do this manual after each boot... (Mosts guest are based on systemd alas, so it makes send to make this available in the startup script)
Anyway here follows: $ lxc-checkconfig LXC version 4.0.4 Kernel configuration not found at /proc/config.gz; searching... Kernel configuration found at /lib/modules/5.4.48-gentoo-x86_64/build/.config --- Namespaces --- Namespaces: enabled Utsname namespace: enabled Ipc namespace: enabled Pid namespace: enabled User namespace: enabled Network namespace: enabled --- Control groups --- Cgroups: enabled Cgroup v1 mount points: /sys/fs/cgroup/openrc /sys/fs/cgroup/cpuset /sys/fs/cgroup/cpu /sys/fs/cgroup/cpuacct /sys/fs/cgroup/blkio /sys/fs/cgroup/memory /sys/fs/cgroup/devices /sys/fs/cgroup/freezer /sys/fs/cgroup/net_cls /sys/fs/cgroup/perf_event /sys/fs/cgroup/net_prio /sys/fs/cgroup/hugetlb /sys/fs/cgroup/pids /sys/fs/cgroup/rdma /sys/fs/cgroup/systemd Cgroup v2 mount points: /sys/fs/cgroup/unified Cgroup v1 clone_children flag: enabled Cgroup device: enabled Cgroup sched: enabled Cgroup cpu account: enabled Cgroup memory controller: enabled Cgroup cpuset: enabled --- Misc --- Veth pair device: enabled, loaded Macvlan: enabled, not loaded Vlan: enabled, loaded Bridges: enabled, loaded Advanced netfilter: enabled, loaded CONFIG_NF_NAT_IPV4: missing CONFIG_NF_NAT_IPV6: missing CONFIG_IP_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_IP6_NF_TARGET_MASQUERADE: enabled, not loaded CONFIG_NETFILTER_XT_TARGET_CHECKSUM: enabled, loaded CONFIG_NETFILTER_XT_MATCH_COMMENT: enabled, loaded FUSE (for use with lxcfs): enabled, loaded --- Checkpoint/Restore --- checkpoint restore: missing CONFIG_FHANDLE: enabled CONFIG_EVENTFD: enabled CONFIG_EPOLL: enabled CONFIG_UNIX_DIAG: enabled CONFIG_INET_DIAG: enabled CONFIG_PACKET_DIAG: enabled CONFIG_NETLINK_DIAG: enabled File capabilities: Note : Before booting a new kernel, you can check its configuration usage : CONFIG=/path/to/config /usr/bin/lxc-checkconfig
anyway after adding that one line the container did start.
(In reply to Nico Baggus from comment #2) > lxc != lxd..... > > I am using the lxd environment (which confusingly does use the lxc command > to manage it, not the lxc-* ). Heh, yes, but the kernel config requirements are rather similar and lxc-checkconfig rules out the most common issues. > > My guest / container would not start. > > /etc/init.d/lxd[5947]: Call to flock failed: Resource temporarily unavailable > /etc/init.d/lxd[5947]: ERROR: lxd stopped by something else > > > /var/log/lxd/VM/lxc.log: > lxc VM 20201020182546.105 ERROR utils - utils.c:lxc_rm_rf:1759 - No such > file or directory - Failed to open dir > "/sys/fs/cgroup/openrc//lxc.payload.VM-6" > lxc VM 20201020182546.106 WARN cgfsng - > cgroups/cgfsng.c:cgroup_tree_remove:965 - Failed to destroy > "/sys/fs/cgroup/openrc//lxc.payload.VM-6" > > Where the file /sys/fs/cgroup/openrc//lxc.payload.VM-6 does exist > > > I based it on https://wiki.gentoo.org/wiki/LXD Section 5.2: > > Running systemd based containers on OpenRC hosts > To support systemd for e.g. ubuntu containers the host must be modified: > > Create the system cgroup directory and mount the cgroup there: > > root #mkdir -p /sys/fs/cgroup/systemd > root #mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd > > And indeed i need to run an ubuntu 18.04 for this VM instance > And don't want to do this manual after each boot... > (Mosts guest are based on systemd alas, so it makes send to make this > available in the startup script) Now I understand the problem, indeed I've never tried running systemd-container on openrc-host. I'll try this out, thanks!
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=4b4cbd6f7c78abe9d831c8425b2a4ebdbba298ca commit 4b4cbd6f7c78abe9d831c8425b2a4ebdbba298ca Author: Joonas Niilola <juippis@gentoo.org> AuthorDate: 2020-10-21 06:03:53 +0000 Commit: Joonas Niilola <juippis@gentoo.org> CommitDate: 2020-10-21 06:03:53 +0000 app-emulation/lxd: fix init.d to allow systemd cont on openrc host Closes: https://bugs.gentoo.org/750410 Signed-off-by: Joonas Niilola <juippis@gentoo.org> app-emulation/lxd/files/lxd-4.0.0.initd | 3 +++ app-emulation/lxd/{lxd-4.0.3.ebuild => lxd-4.0.3-r1.ebuild} | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-)
Hey, sorry for opening this up again. But this small one liner is opening up another problem. A non-systemd container (gentoo) on a non-systemd host os (gentoo) with application using cgroups is searching for the folder /sys/fs/cgroup/systemd which is not existing inside the container BUT cat /proc/1/cgroup: 14:name=systemd:/ 13:pids:/ 12:hugetlb:/ 11:net_prio:/ 10:perf_event:/ 9:net_cls:/ 8:freezer:/ 7:devices:/ 6:memory:/ 5:blkio:/ 4:cpuacct:/ 3:cpu:/ 2:cpuset:/ 1:name=openrc:/ 0::/ is mentioning the systemd folder. I noticed the problem with docker inside of the lxd container which gave me this error "cgroups: cannot find cgroup mount destination: unknown". I used lxd-4.01 before and everything worked fine. With lxd-4.0.3-r1 I get the problem. So maybe the solution for container with systemd is not that ideal for the rest and a better solution is needed? I mean, I read I can use this fix also inside the container but I don't know the implications, because I don't understand the technically behind this (?)namespace(?). Greetings
That prohibits the running of systemd requiring lxd guests together with non-systemd lxd guests.... I hope someone with more systemd experience can shine a light here... I'l look and try if there is another solution.
(In reply to darthm from comment #7) > Hey, > > sorry for opening this up again. > But this small one liner is opening up another problem. > Don't be, if it's broken, it needs to be fixed. 1: If you manually do mkdir -p /sys/fs/cgroup/systemd ; mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd inside the container, does it fix the problem? 2: If you use the previous file, https://gitweb.gentoo.org/repo/gentoo.git/plain/app-emulation/lxd/files/lxd-4.0.0.initd?id=a7d7c673797b5ff17c5a28b9d1a131d000cac3d7 does it work alright? Not sure whether the "systemd fix" should be reverted or not when weighing gains. Might be other way to solve this.
Hey, if you use the old init.d script, sure it will work. But if you started the service before, you have to restart the system first, because just unmounting /sys/fs/cgroup/systemd won't remove the entry from /proc/1/cgroup. And I don't know how to remove the entry. Like I said, I'm just using cgroups now, because I got into kubernetes/docker stuff. But I don't understand the technical implications by using and modifying it. Maybe it's not a big deal at all because it's not important in which namespace (is this entry a namespace?) the programs are running. Btw. this systemd-fix is not only affecting the containers inside of lxd, it's affecting the WHOLE host os system too, because all services and application which are started after the lxd service will use the /sys/fs/cgroup/systemd path because of the name=systemd entry. At least it breaks all non-systemd, cgroup using containers because a docker inside of lxd will break the whole container. It tries to start the docker containers till it is OOM and then it will kill/stop the lxd container. And I guess docker is only one example of it. In my eyes this fix needs to be reverted or someone with good cgroup knowledge has to explain the implications of this name entry and that it is not a problem or even introducing a systemd dependency into a non-systemd system. Greetings
(I think juippis was just trying to check if that even is a workaround to understand the issue fully.)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=1e6334d771a7f7b7b38c8221a5649f2e81c5ce8a commit 1e6334d771a7f7b7b38c8221a5649f2e81c5ce8a Author: Joonas Niilola <juippis@gentoo.org> AuthorDate: 2020-10-26 06:25:22 +0000 Commit: Joonas Niilola <juippis@gentoo.org> CommitDate: 2020-10-26 06:29:59 +0000 app-emulation/lxd: revert openrc -> systemd-cgroups fix - it now breaks openrc -> openrc-cgroups instead. A better solution needs to be found. Please see https://wiki.gentoo.org/wiki/LXD#Running_systemd_based_containers_on_OpenRC_hosts for a workaround, that you can add to your own init.d file if needed. Bug: https://bugs.gentoo.org/750410 Signed-off-by: Joonas Niilola <juippis@gentoo.org> app-emulation/lxd/files/lxd-4.0.0.initd | 3 --- app-emulation/lxd/{lxd-4.0.3-r1.ebuild => lxd-4.0.3-r2.ebuild} | 0 app-emulation/lxd/{lxd-4.0.4.ebuild => lxd-4.0.4-r1.ebuild} | 0 3 files changed, 3 deletions(-)
I did some testing. devuan (sysv-init) doesn't seem to have issues... I didn't verify a lot of programs. launing a gentoo container + emerge docker inside it: Any openrc script fill mention a ulimit failure: /lib/rc/sh/openrc-run.sh: line 258: ulimit: open files: cannot modify limit: Operation not permitted openrc by it's own fails: * Configuring kernel parameters ... sysctl: permission denied on key "fs.protected_symlinks" sysctl: permission denied on key "fs.protected_hardlinks" * Unable to configure some kernel parameters [ !! ] * ERROR: sysctl failed to start (caused because the "root" inside the lxd != root on the host.) The docker fails with: time="2020-10-26T07:53:03.525663749Z" level=error msg="Failed to built-in GetDriver graph overlay /var/lib/docker" time="2020-10-26T07:53:03.525681112Z" level=error msg="Failed to built-in GetDriver graph devicemapper /var/lib/docker" time="2020-10-26T07:53:03.956940242Z" level=warning msg="Your kernel does not support cgroup blkio weight" time="2020-10-26T07:53:03.957005179Z" level=warning msg="Your kernel does not support cgroup blkio weight_device" time="2020-10-26T07:53:03.958153537Z" level=info msg="Loading containers: start." time="2020-10-26T07:53:04.666254685Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" time="2020-10-26T07:53:05.020395691Z" level=info msg="Loading containers: done." time="2020-10-26T07:53:05.051482457Z" level=info msg="Docker daemon" commit=4484c46d9d graphdriver(s)=vfs version=19.03.13 time="2020-10-26T07:53:05.051612900Z" level=info msg="Daemon has completed initialization" time="2020-10-26T07:53:05.185888042Z" level=info msg="API listen on /var/run/docker.sock" With the mount command issued INSIDE an lxd container: [ -d /sys/fs/cgroup/systemd ] || ( mkdir -p /sys/fs/cgroup/systemd ; mount -t cgroup -o none,name=systemd systemd /sys/fs/cgroup/systemd ) docker does start. Now i restarted the lxd without after umounting the /sys/fs/cgroup/systemd All the guest containers do start. (the systemd directory still does exist). (also the oes requiring systemd..) /etc/init.d/docker start /lib/rc/sh/openrc-run.sh: line 258: ulimit: open files: cannot modify limit: Operation not permitted * Starting docker ... [ ok ] blabla2 ~ # docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES So to just create the directory on the host seems to be sufficient for anything HAVING systemd, no need to mount on the host. Guests without system don't care as there is no mount. I suggest the following is tested as well: @@ -17,6 +17,7 @@ ebegin "Starting lxd service" modprobe -f loop > /dev/null 2>&1 + [ -d /sys/fs/cgroup/systemd ] || mkdir -p /sys/fs/cgroup/systemd # fix permissions on /var/lib/lxd and make sure it exists install -d /var/lib/lxd --group lxd --owner root --mode 0775
Another consideration: running docker within lxd means running nested cgroup handling. I am not sure what issues that may cause.
I later saw the systemd mount is only forgotten after a host reboot. And indeed that is the case. So only creating the directory is not a solution. Without it an ubuntu 18.04 guest looks like this (20 minutes after starting..): 9310 ? Ss 0:00 [lxc monitor] /var/lib/lxd/containers code 9318 ? Ss 0:00 \_ /sbin/init (/sbin/init is linked to lib/systemd/systemd) so an systemd guest is a nonstarter. It should run a Collabora service for a nextcloud environtment. Next suggestion: Put a conditional line in init.d/lxd and create a conf.d/lxd that allows to start it (or not).
Adding the line by hand means ADDING at every update. or at least allow all init.d/ files to be handled with dispatch-conf / etc-update and manual prevent updating.
As I said with mounting the cgroup folder, you are not only affecting the containers but the host system itself. Any docker (just an example for an application which is using cgroups) running outside of lxd (on the host) would use the system path instead of the openrc path and I don't know which implications that has. Keep that in mind. But I could live with a conditional clause inside the init script which checks for settings in the conf.d file. The setting should be off by default and commented with a hint that everybody who activates it and has problems with cgroup, has to disable it first, for filling any bug (something like that, I guess you get the point). By the way, for docker inside of lxd you have to configure the guest for special permissions, but that's off topic I guess. Thanks for caring and greetings
IMHO systemd is a turd... however polished, a turd. Problem is Collabora server can only run with systemd enabled environments. I havent noticed ill side effects on the host yet. (N=1 observation).
This issue has been addressed in the wiki - https://wiki.gentoo.org/wiki/LXD#Running_systemd_based_containers_on_OpenRC_hosts Please confirm this fix works and close if appropriate.
I tried the solution and the LXC clients could not start. I'll try to verify where the issue might be, i need to check if it is with cgroupsv2.
Is this still a problem? OpenRC has been updated multiple times and even lxd's init.d file received, quite a many, updates since.
It now seems to work. With the most recent update: devices are now not renamed to "eth0" but keep their veth.... random name causing the containers to NOT fetch their addresses).
(In reply to Nico Baggus from comment #22) > It now seems to work. > Glad to hear it! > > With the most recent update: devices are now not renamed to "eth0" but keep > their veth.... random name causing the containers to NOT fetch their > addresses). AFAIK this change didn't land on LTS branch, that is present in the Gentoo repo, but somewhere later. I tested with lxd-4.0.6 and have no problems here creating new containers with network. Maybe it's also a profile thing for new setups... hmm...
profile: config: {} description: Default LXD profile devices: eth0: name: eth0 nictype: bridged parent: br0 type: nic root: path: / pool: default type: disk name: default used_by: - /1.0/instances/nc-beta nc-beta is such a debian based system that needs the name to be hardcoded in config files.
Alas, If setup with rc_cgroup_mode="unified" rc_cgroup_controllers="systemd" The eth0 device are not setup in the container systems. they are still mentioned with random vethXXXXXX names. Centos containers can cope (CentOS detects devices by MAC address) en will receive their DHCP assigned addresses. Debian Based containers are missing eth0: and therefore have no concept of what to do. (no match on device name). Due to unpredictable nature of veth devicenames no fix here. So I reverted back to: rc_cgroup_mode="hybrid" And mounting my /sys/fs/cgroup.. bingo eth0 devices... The problem might verify well be in lxcfs: in /var/lib/lxcfs now shows: drwxr-xr-x 2 root root 0 May 2 22:07 cgroup dr-xr-xr-x 2 root root 0 May 2 22:07 proc dr-xr-xr-x 2 root root 0 May 2 22:07 sys While rc_cgroup_mode=unified then cgroup shows as: ????????? ? ? ? ? ??? ?? ??:?? cgroup rc_cgroup_mode=hybrid has also been tried without my private mount.. resulting in no starting containers.