I've set up lxc gentoo guest on gentoo host, tried to update system, and portage failed on unpack stage. I've checked features, and toggled them one by one until I've discovered that disabling "userpriv" feature fixes the issue. Reproducible: Always Steps to Reproduce: 1. emerge =app-emulation/lxc-2.0.7 2. lxc-create -n testbox -t gentoo 3. lxc-start -n testbox 4. lxc-attach -n testbox 5. inside the lxc: FEATURE="userpriv" emerge -1v portage Actual Results: emerge fails on unpack stage Expected Results: emerge should succeed Emerge info on lxc gentoo guest: testbox ~ # emerge --info setlocale: unsupported locale setting setlocale: unsupported locale setting Portage 2.3.3 (python 3.4.5-final-0, default/linux/amd64/13.0, gcc-4.9.4, glibc-2.23-r3, 4.9.6-gentoo-r1.46 x86_64) ================================================================= System uname: Linux-4.9.6-gentoo-r1.46-x86_64-Pentium-R-_Dual-Core_CPU_T4200_@_2.00GHz-with-gentoo-2.3 KiB Mem: 4049392 total, 3588296 free KiB Swap: 4192252 total, 4192252 free Timestamp of repository gentoo: Tue, 07 Feb 2017 00:45:01 +0000 sh bash 4.3_p48-r1 ld GNU ld (Gentoo 2.25.1 p1.1) 2.25.1 app-shells/bash: 4.3_p48-r1::gentoo dev-lang/perl: 5.22.3_rc4::gentoo dev-lang/python: 2.7.12::gentoo, 3.4.5::gentoo dev-util/pkgconfig: 0.28-r2::gentoo sys-apps/baselayout: 2.3::gentoo sys-apps/openrc: 0.22.4::gentoo sys-apps/sandbox: 2.10-r3::gentoo sys-devel/autoconf: 2.69::gentoo sys-devel/automake: 1.14.1::gentoo, 1.15::gentoo sys-devel/binutils: 2.25.1-r1::gentoo sys-devel/gcc: 4.9.4::gentoo sys-devel/gcc-config: 1.7.3::gentoo sys-devel/libtool: 2.4.6-r2::gentoo sys-devel/make: 4.2.1::gentoo sys-kernel/linux-headers: 4.4::gentoo (virtual/os-headers) sys-libs/glibc: 2.23-r3::gentoo Repositories: gentoo location: /usr/portage sync-type: rsync sync-uri: rsync://rsync.gentoo.org/gentoo-portage priority: -1000 ACCEPT_KEYWORDS="amd64" ACCEPT_LICENSE="* -@EULA" CBUILD="x86_64-pc-linux-gnu" CFLAGS="-O2 -pipe" CHOST="x86_64-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt" CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/splash /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c" CXXFLAGS="-O2 -pipe" DISTDIR="/usr/portage/distfiles" FCFLAGS="-O2 -pipe" FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr" FFLAGS="-O2 -pipe" GENTOO_MIRRORS="http://distfiles.gentoo.org" LANG="ru_RU.utf8" LDFLAGS="-Wl,-O1 -Wl,--as-needed" PKGDIR="/usr/portage/packages" PORTAGE_CONFIGROOT="/" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git" PORTAGE_TMPDIR="/var/tmp" USE="acl amd64 berkdb bindist bzip2 cli cracklib crypt cxx dri fortran gdbm iconv ipv6 modules multilib ncurses nls nptl openmp pam pcre readline seccomp session ssl tcpd unicode xattr zlib" ABI_X86="64" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6" PYTHON_SINGLE_TARGET="python2_7" PYTHON_TARGETS="python2_7 python3_4" RUBY_TARGETS="ruby21" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account" Unset: CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, MAKEOPTS, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, USE_PYTHON
Created attachment 465408 [details] full-log-1.txt Basically, it's a log of failure. Important lines: >>> Source unpacked in /var/tmp/portage/sys-apps/portage-2.3.3/work Traceback (most recent call last): File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/pym/portage/locks.py", line 152, in lockfile myfd = os.open(lockfilename, os.O_CREAT|os.O_RDWR, 0o660) File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/pym/portage/__init__.py", line 250, in __call__ rval = self._func(*wrapped_args, **wrapped_kwargs) PermissionError: [Errno 13] Permission denied: b'/var/tmp/portage/sys-apps/.portage-2.3.3.portage_lockfile' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/bin/ebuild-ipc.py", line 282, in <module> sys.exit(ebuild_ipc_main(sys.argv[1:])) File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/bin/ebuild-ipc.py", line 279, in ebuild_ipc_main return ebuild_ipc.communicate(args) File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/bin/ebuild-ipc.py", line 139, in communicate return self._communicate(args) File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/bin/ebuild-ipc.py", line 245, in _communicate if not self._daemon_is_alive(): File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/bin/ebuild-ipc.py", line 124, in _daemon_is_alive wantnewlockfile=True, flags=os.O_NONBLOCK) File "/var/tmp/portage/._portage_reinstall_.9vqnbny1/pym/portage/locks.py", line 158, in lockfile raise PermissionDenied(func_call) portage.exception.PermissionDenied: open('/var/tmp/portage/sys-apps/.portage-2.3.3.portage_lockfile')
Created attachment 465410 [details] full-log-2.txt I've patched locks.py in order to add some more info: Important lines: lockfile /var/tmp/portage/sys-apps/.portage-2.3.3.portage_lockfile, mode 66, perms 660, uid 250, gid 250 -rw-r----- 1 root portage 0 Feb 27 17:12 /var/tmp/portage/sys-apps/.portage-2.3.3.portage_lockfile Basically, it tries to lock for uid portage a file owned by root and fails. Here's the patch: --- /usr/lib/python3.4/site-packages/portage/locks.py.back 2017-02-27 17:32:31.843193592 +0300 +++ /usr/lib/python3.4/site-packages/portage/locks.py 2017-02-27 17:09:53.169179355 +0300 @@ -149,6 +149,9 @@ old_mask = os.umask(000) try: try: + buf = "lockfile %s, mode %d, perms %o, uid %d, gid %d" % ( lockfilename, os.O_CREAT|os.O_RDWR, 0o660, os.getuid(), os.getgid()) + print(buf) + os.system("ls -la %s" % lockfilename) myfd = os.open(lockfilename, os.O_CREAT|os.O_RDWR, 0o660) except OSError as e: func_call = "open('%s')" % lockfilename @@ -325,6 +328,9 @@ else: raise InvalidData + buf = "unlockfile %s, uid %d, gid %d" % ( lockfilename, os.getuid(), os.getgid()) + print(buf) + if(myfd == HARDLINK_FD): unhardlink_lockfile(lockfilename, unlinkfile=unlinkfile) return True
Created attachment 465414 [details, diff] feature-userpriv-chown.patch I couldn't figure out why lock isn't freed by the time portage drops privileges, but I noticed that when lock is created portage changes group of file, but leaves user intact. This patch is more like a hack, not sure it's a correct fix for the issue, but it worked for me.
What is the underlying filesystem type that you are using for /var/tmp/portage?
It's ext3 in the guest. Here're filesystem options in case it's relevant: Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: journal_data_ordered user_xattr acl I'm using tmpfs on host system. I've disabled it and tried using same ext3 fs, but issue didn't reproduce for me (I've made sure to disable my patch).
Normally, it's possible to lock the file without changing the uid. It should not be necessary to change the uid, so I think there's something wrong with your configuration. We shouldn't change the code unless we have a very good explanation for the reasoning.
Yes, I know that something is wrong, otherwise it would work. I can provide more info if you'd like to debug this issue. I think issue might be caused by lockfile permissions, lockfile owner uid/gid (which my patch updates) or lockfile not being released prior to dropping privileges. As you can see, lockfile has permissions 0640 instead of 0660, and owner root, group portage. If owner changed to portage, then opening lockfile works. If permissions of lockfile are changed to 0660, I guess it should work too. As for the result of call: myfd = os.open(lockfilename, os.O_CREAT|os.O_RDWR, 0o660) I think there's some issue with umask interaction. I've made a simple script and ran it on the host and the guest. Here's the script: #!/usr/bin/python import os os.umask(0o0002) os.open("/tmp/testfile", os.O_CREAT | os.O_RDWR, 0o0660) os.system("ls -la /tmp/testfile") Here's the host: linux ~ # rm /tmp/testfile linux ~ # LC_ALL=C python /var/lib/lxc/testbox/rootfs/tmp/python-test.py -rw-rw---- 1 root root 0 Feb 28 15:03 /tmp/testfile guest: testbox ~ # rm /tmp/testfile testbox ~ # LC_ALL=C python /tmp/python-test.py -rw-r----- 1 root root 0 Feb 28 15:04 /tmp/testfile Both host and guest have default umask 0022 in the /etc/login.defs. I'll provide a patch to change permissions of lockfiles instead. It should fix the issue, but I'm still not sure what causes the issue with umask yet, but it appears in lxc containers for me
Created attachment 465512 [details, diff] locks-chmod.patch Another patch for the issue which worked for me. This one makes sure to correct file permissions for lock files in case of issues with umask.
It seems that the 022 umask set by emerge is masking out the write bit of the 0o660 specified in the locks.py os.open call. I'm not sure if calling chmod in locks.py is the best solution. If we do that, then there should be a mode parameter to the lockfile function, so that the caller can control it.
Actually, the umask is temporarily changed here: old_mask = os.umask(000) So we need an explanation for why that's not working. Otherwise, the patch looks pretty reasonable.
Yes, umask is temporarily changed before creating a lock file, but for some reason it doesn't have an effect inside an lxc container for me. I didn't figure out the reason yet.
Created attachment 465620 [details] minimal test case Here's a minimal test case that hopefully you can use to reproduce the behavior. If that reproduces it, then you can use it to file an issue here: https://github.com/lxc/lxc/issues
Opened a bug on lxc issues tracker: https://github.com/lxc/lxc/issues/1448
I've finally found the root of issue. I'll duplicate my comment from LXC issue tracker here. It was related to ACL. The box I ran had default acl on every directory like this: $ getfacl / getfacl: Removing leading '/' from absolute path names # file: . # owner: root # group: root user::rwx group::r-x other::r-x default:user::rwx default:group:r-x default:other:r-x I've compared this setup with my other boxes (and virtualbox test setup) and didn't find similar ACLs on other boxes. After that I remounted root system with 'noacl' and issue gone. I didn't notice this issue on host system since directories /tmp and /var/tmp/portage were mounted as tmpfs there. I've booted to recovery image, ran 'setfacl -kR ...' on every filesystem since I didn't need these ACL, booted to system and confirmed that issue disappeared. Now I know the missing step to reproduction (NOTE, don't do if you have some ACL set up, either on host, or in any client which has filesystem located in /var/lib/lxc, or save it before doing it): setfacl -dR --set=u::rwx,g::rx,o::rx /var/lib/lxc To remove this ACL later run (same NOTE as above): setfacl -kR /var/lib/lxc In order to test the issue on host box, use '/tmp' instead of '/var/lib/lxc' in the commands above. I've fixed my setup. Please close bug if you think portage shouldn't work around such issues in the filesystem setup and should just fail. Otherwise there's a patch already attached which allowed portage to work for me even with such setup.
I'm glad you found the root cause. I suppose we could have portage try to detect interference from ACLs, but somebody interested in that would have to submit a patch.