When executing `lxd init` like in the https://wiki.gentoo.org/wiki/LXD with `cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)` the lxd hangs and does not respond. Reproducible: Always Steps to Reproduce: 1.emerge app-emulation/lxd 2.lxd init 3. Actual Results: hangs and doesnt react Expected Results: start the init process # gdb lxd init Backtrack.log ``` #0 runtime.epollwait () at /usr/lib/go/src/runtime/sys_linux_amd64.s:725 #1 0x000000000043fb92 in runtime.netpoll (delay=-1, ~r1=...) at /usr/lib/go/src/runtime/netpoll_epoll.go:126 #2 0x000000000044bcae in runtime.findrunnable (gp=0xc000059800, inheritTime=false) at /usr/lib/go/src/runtime/proc.go:2923 #3 0x000000000044d337 in runtime.schedule () at /usr/lib/go/src/runtime/proc.go:3169 #4 0x000000000044d8bd in runtime.park_m (gp=0xc000001200) at /usr/lib/go/src/runtime/proc.go:3318 #5 0x000000000047c2fb in runtime.mcall () at /usr/lib/go/src/runtime/asm_amd64.s:327 #6 0x000000000047c1f4 in runtime.rt0_go () at /usr/lib/go/src/runtime/asm_amd64.s:226 #7 0x0000000000000000 in ?? () ``` # journalctl -fu lxd ``` Jun 26 12:10:31 linovo systemd[1]: Starting LXD - main daemon... Jun 26 12:10:31 linovo lxd[65579]: t=2021-06-26T12:10:31-0600 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work" Jun 26 12:10:31 linovo lxd[65579]: t=2021-06-26T12:10:31-0600 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work" Jun 26 12:10:31 linovo lxd[65579]: t=2021-06-26T12:10:31-0600 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored" Jun 26 12:10:31 linovo lxd[65579]: t=2021-06-26T12:10:31-0600 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored" Jun 26 12:10:31 linovo lxd[65579]: t=2021-06-26T12:10:31-0600 lvl=warn msg="Dqlite: attempt 0: server 1: no known leader" ``` # Cgroup mounts linovo /usr/src/linux # cat /proc/mounts | grep cgroup cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime 0 0 linovo /usr/src/linux #
Created attachment 719475 [details] lxc-checkconfig output
Portage 3.0.20 (python 3.9.5-final-0, default/linux/amd64/17.1/desktop/plasma/systemd, gcc-11.1.0, glibc-2.33-r1, 5.12.13-gentoo x86_64) ================================================================= System uname: Linux-5.12.13-gentoo-x86_64-Intel-R-_Core-TM-_i7-6820HQ_CPU_@_2.70GHz-with-glibc2.33 KiB Mem: 16264464 total, 5832120 free KiB Swap: 16746492 total, 16746492 free Timestamp of repository gentoo: Mon, 28 Jun 2021 15:30:01 +0000 Head commit of repository gentoo: ae28e2356b8cb60892daeb782327a8c0119e55a2 sh bash 5.1_p8 ld GNU ld (Gentoo 2.36.1 p3) 2.36.1 app-shells/bash: 5.1_p8::gentoo dev-java/java-config: 2.3.1::gentoo dev-lang/perl: 5.34.0::gentoo dev-lang/python: 2.7.18_p11::gentoo, 3.7.10_p6::gentoo, 3.8.10_p2::gentoo, 3.9.5_p2::gentoo, 3.10.0_beta3::gentoo dev-lang/rust: 1.53.0::gentoo dev-util/cmake: 3.20.5::gentoo dev-util/pkgconfig: 0.29.2::gentoo sys-apps/baselayout: 2.7-r3::gentoo sys-apps/sandbox: 2.24::gentoo sys-devel/autoconf: 2.13-r1::gentoo, 2.69-r5::gentoo sys-devel/automake: 1.11.6-r3::gentoo, 1.16.3-r1::gentoo sys-devel/binutils: 2.36.1-r1::gentoo sys-devel/gcc: 11.1.0-r1::gentoo sys-devel/gcc-config: 2.4::gentoo sys-devel/libtool: 2.4.6-r6::gentoo sys-devel/make: 4.3::gentoo sys-kernel/linux-headers: 5.12::gentoo (virtual/os-headers) sys-libs/glibc: 2.33-r1::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.gentoo.org/gentoo-portage priority: -1000 sync-rsync-verify-metamanifest: yes sync-rsync-verify-jobs: 1 sync-rsync-verify-max-age: 24 sync-rsync-extra-opts: ACCEPT_KEYWORDS="amd64 ~amd64" ACCEPT_LICENSE="@FREE" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-march=native -O2 -pipe -ggdb" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt /usr/share/maven-bin-3.8/conf /usr/share/sddm/scripts/Xsetup" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php8.0/ext-active/ /etc/php/cgi-php8.0/ext-active/ /etc/php/cli-php8.0/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-march=native -O2 -pipe -ggdb" DISTDIR="/usr/portage/distfiles" ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs candy cgroup config-protect-if-modified distlocks ebuild-locks fakeroot fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LANG="en_US.UTF-8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" LINGUAS=" de us es" MAKEOPTS=" -j9 -l8" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="X a52 aac acl acpi activities alsa amd64 berkdb bluetooth branding bzip2 cairo cdda cdr cli crypt cups dbus declarative dri dts dvd dvdr emboss encode exif flac fortran gdbm gif glamor gpm gstreamer gtk gui iconv icu ipv6 jpeg kde kipi kwallet lcms libglvnd libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses nls nptl ogg opengl openmp openssl pam pango pcre pdf phonon plasma png policykit ppds pulseaudio qml qt5 readline sdl seccomp semantic-desktop spell split-usr ssl startup-notification svg systemd tcpd tiff truetype udev udisks unicode upower usb vorbis widgets wxwidgets x264 xattr xcb xml xv xvid zlib" ABI_X86="64 32" ADA_TARGET="gnat_2018" ALSA_CARDS="intel-hda" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="sse3 mmxext fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="efi-64" INPUT_DEVICES="evdev synaptics wacom" KERNEL="linux" L10N="de" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-3 php7-4" POSTGRES_TARGETS="postgres12" PYTHON_SINGLE_TARGET="python3_8" PYTHON_TARGETS="python3_7 python3_9 python3_8" QEMU_SOFTMMU_TARGETS="arm x86_64" QEMU_USER_TARGETS="x86_64" RUBY_TARGETS="ruby27 ruby30" USERLAND="GNU" VIDEO_CARDS="intel i965 nvidia i915" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RUSTFLAGS
Hmm, I find it really weird cgroups2 wouldn't work on systemd. Those seems warnings too, not fatal errors. Maybe "lxd -v --debug init" will show more information? I suspect the breakage is elsewhere.
```bash linovo /home/pscp # lxd -v --debug init DBUG[07-01|13:28:39] Connecting to a local LXD over a Unix socket DBUG[07-01|13:28:39] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= ``` And here it hangs no more output. Do you have some ideas where else I could be checking for errors?
(In reply to Stefan Prostler from comment #4) > ```bash > linovo /home/pscp # lxd -v --debug init > DBUG[07-01|13:28:39] Connecting to a local LXD over a Unix socket > DBUG[07-01|13:28:39] Sending request to LXD method=GET > url=http://unix.socket/1.0 etag= > ``` > > And here it hangs no more output. > Do you have some ideas where else I could be checking for errors? You could strace it, strace /usr/sbin/lxd init Also you *might* find something more from non-isolated journalctf. For example, Jul 02 06:34:30 gentoo-systemd-test systemd[1]: Starting LXD - main daemon... Jul 02 06:34:30 gentoo-systemd-test systemd[1]: Listening on LXD - unix socket. Jul 02 06:34:30 gentoo-systemd-test systemd[1]: Condition check resulted in FUSE filesystem for LXC being skipped. Jul 02 06:34:30 gentoo-systemd-test systemd[1]: Starting LXD - unix socket. Jul 02 06:34:30 gentoo-systemd-test systemd[1]: Reached target Network is Online. does not show on -fu lxd
- The output leads me to the missing cgroup v1 and an apparmor complaint. Do I need cgroupv1 ? - The strace gets stuck at the socket. - I also changed file permissions from root to lxd group just in case `journalctl -f` ```bash Jul 02 07:12:27 linovo systemd[1]: Starting LXD - main daemon... Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg="AppArmor support has been disabled because of lack of kernel support" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup CPU controller, CPU time limits will be ignored" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup devices controller, device access control won't work" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup freezer controller, pausing/resuming containers won't work" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored" Jul 02 07:12:27 linovo lxd[229952]: t=2021-07-02T07:12:27-0600 lvl=warn msg=" - Couldn't find the CGroup network priority controller, network priority will be ignored" Jul 02 07:12:28 linovo lxd[229952]: t=2021-07-02T07:12:28-0600 lvl=warn msg="Dqlite: attempt 0: server 1: no known leader" Jul 02 07:12:28 linovo lxd[229952]: t=2021-07-02T07:12:28-0600 lvl=warn msg="Dqlite: attempt 1: server 1: no known leader" Jul 02 07:12:28 linovo lxd[229952]: t=2021-07-02T07:12:28-0600 lvl=warn msg="Dqlite: attempt 2: server 1: no known leader" ``` strace /usr/sbin/lxd init ```bash futex(0xc000182150, FUTEX_WAKE_PRIVATE, 1) = 1 openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 newfstatat(3, "", {st_mode=S_IFREG|0644, st_size=4917040, ...}, AT_EMPTY_PATH) = 0 mmap(NULL, 4917040, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fb26bb4f000 close(3) = 0 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb28000a000 ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0 futex(0xc000182150, FUTEX_WAKE_PRIVATE, 1) = 1 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_UNIX, sun_path="/var/lib/lxd/unix.socket"}, 27) = 0 epoll_ctl(4, EPOLL_CTL_ADD, 3, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data={u32=2182883632, u64=140404663789872}}) = 0 getsockname(3, {sa_family=AF_UNIX}, [112 => 2]) = 0 getpeername(3, {sa_family=AF_UNIX, sun_path="/var/lib/lxd/unix.socket"}, [112 => 27]) = 0 futex(0xc000182150, FUTEX_WAKE_PRIVATE, 1) = 1 read(3, 0xc0003d5000, 4096) = -1 EAGAIN (Resource temporarily unavailable) epoll_pwait(4, [{events=EPOLLOUT, data={u32=2182883632, u64=140404663789872}}], 128, 0, NULL, 2) = 1 epoll_pwait(4, [{events=EPOLLOUT, data={u32=2182883632, u64=140404663789872}}], 128, -1, NULL, 7) = 1 epoll_pwait(4, [], 128, 0, NULL, 0) = 0 epoll_pwait(4, ^Z [7]+ Stopped strace /usr/sbin/lxd init ``` ```bash linovo /home/pscp # ls -lha /var/lib/lxd/unix.socket srw-rw---- 1 root lxd 0 Jun 29 18:07 /var/lib/lxd/unix.socket linovo /home/pscp # ```
No, those warnings are fine. It should work. Is that everything for the strace? Hmm, maybe "-v --debug" will add more. Also, it should be ran with --group lxd by default (from the service file). What happens if you do: systemctl stop lxd killall lxd lxd --debug --group lxd then wait a bit. Can you use "lxc list" or similar "lxc" command? (and you can do the whole command series twice, maybe it for some permission issue fails to create a socket?)
# Maybe it has something to do with the subuid and subguid kill ```bash linovo /home/pscp # kill -9 `ps aux | grep lxd | awk '{print $2}'` bash: kill: (32951) - No such process linovo /home/pscp # ``` `lxd --debug --group lxd ` ```bash linovo /home/pscp # lxd --debug --group lxd DBUG[07-02|08:12:30] Connecting to a local LXD over a Unix socket DBUG[07-02|08:12:30] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= INFO[07-02|08:12:30] LXD 4.0.5 is starting in normal mode path=/var/lib/lxd INFO[07-02|08:12:30] Kernel uid/gid map: INFO[07-02|08:12:30] - u 0 0 4294967295 INFO[07-02|08:12:30] - g 0 0 4294967295 INFO[07-02|08:12:30] Configured LXD uid/gid map: INFO[07-02|08:12:30] - u 0 1000000 1000000000 INFO[07-02|08:12:30] - g 0 1000000 1000000000 WARN[07-02|08:12:30] AppArmor support has been disabled because of lack of kernel support INFO[07-02|08:12:30] Kernel features: INFO[07-02|08:12:30] - closing multiple file descriptors efficiently: yes INFO[07-02|08:12:30] - netnsid-based network retrieval: yes INFO[07-02|08:12:30] - pidfds: yes INFO[07-02|08:12:30] - uevent injection: yes INFO[07-02|08:12:30] - seccomp listener: yes INFO[07-02|08:12:30] - seccomp listener continue syscalls: yes INFO[07-02|08:12:30] - seccomp listener add file descriptors: yes INFO[07-02|08:12:30] - attach to namespaces via pidfds: yes INFO[07-02|08:12:30] - safe native terminal allocation : yes INFO[07-02|08:12:30] - unprivileged file capabilities: yes INFO[07-02|08:12:30] - cgroup layout: cgroup2 WARN[07-02|08:12:30] - Couldn't find the CGroup blkio, disk I/O limits will be ignored WARN[07-02|08:12:30] - Couldn't find the CGroup blkio.weight, disk priority will be ignored WARN[07-02|08:12:30] - Couldn't find the CGroup CPU controller, CPU time limits will be ignored WARN[07-02|08:12:30] - Couldn't find the CGroup CPUacct controller, CPU accounting will not be available WARN[07-02|08:12:30] - Couldn't find the CGroup hugetlb controller, hugepage limits will be ignored WARN[07-02|08:12:30] - Couldn't find the CGroup network priority controller, network priority will be ignored INFO[07-02|08:12:30] - shiftfs support: no INFO[07-02|08:12:30] Initializing local database DBUG[07-02|08:12:30] Initializing database gateway DBUG[07-02|08:12:30] Start database node address= role=voter id=1 DBUG[07-02|08:12:30] Connecting to a local LXD over a Unix socket DBUG[07-02|08:12:30] Sending request to LXD method=GET url=http://unix.socket/1.0 etag= DBUG[07-02|08:12:30] Detected stale unix socket, deleting DBUG[07-02|08:12:30] Detected stale unix socket, deleting INFO[07-02|08:12:30] Starting /dev/lxd handler: INFO[07-02|08:12:30] - binding devlxd socket socket=/var/lib/lxd/devlxd/sock INFO[07-02|08:12:30] REST API daemon: INFO[07-02|08:12:30] - binding Unix socket socket=/var/lib/lxd/unix.socket INFO[07-02|08:12:30] Initializing global database WARN[07-02|08:12:30] Dqlite: attempt 0: server 1: no known leader WARN[07-02|08:12:30] Dqlite: attempt 1: server 1: no known leader WARN[07-02|08:12:30] Dqlite: attempt 2: server 1: no known leader WARN[07-02|08:12:31] Dqlite: attempt 3: server 1: no known leader ``` ```bash linovo /home/pscp # lxc list Error: Get "http://unix.socket/1.0": dial unix /var/lib/lxd/unix.socket: connect: connection refused linovo /home/pscp # ```
`lxc list` hangs ```bash linovo /home/pscp # lxc list ```
(In reply to Stefan Prostler from comment #8) > # Maybe it has something to do with the subuid and subguid > lxd itself should work fine without subuid/subgid, but lxc would throw an error there. > ``` > > ```bash > linovo /home/pscp # lxc list > Error: Get "http://unix.socket/1.0": dial unix /var/lib/lxd/unix.socket: > connect: connection refused > linovo /home/pscp # > ``` This does look like some permission problem... Maybe try (as root): systemctl stop lxd killall lxd ps aux | grep lxd rm /var/lib/lxd/unix.socket lxd --debug -v --group lxd Note that your user needs to be added to lxd group if you wish to use 'lxc' with it. You can also try to set some logging, lxd --debug -v --logfile /var/log/lxd/lxd.log if you don't have logging enabled by default. That lxd.log might prove useful. Starting to run out of ideas here, you're most likely much better served at https://discuss.linuxcontainers.org if the above steps don't prove helpful. Oh one more thing, does lxd-4.0.5 still work, and 4.0.6 does not? What about lxc-4.0.6 vs. lxc-4.0.9-r1? Because indeed there's a fix for recent kernels in 4.0.9-r1: https://gitweb.gentoo.org/repo/gentoo.git/commit/app-emulation/lxc?id=fb00aa98de17dba7ffb4ef5fed6608af8a6968d8 So if you are running lxc-4.0.6 it might be due to that.
I tried removing ~amd64 and went down to 4.0.5 but also no change. ```bash echo =app-emulation/lxd-4.0.6 >> /etc/portage/package.mask emerge -avDuN lxd ... ``` I tried as root as well as normal user within the lxd group. ```bash ✘ ⚙ pscp@linovo ~/tmx-cups getent group| grep lxd lxd:x:402:pscp ``` Even with --logfile lxd.log -l DEBUG I don`t get more useful complains. So Thank you a lot for the effort. I will try https://discuss.linuxcontainers.org and report back here when something something comes up.