Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 641262 - btrfs - stale entries in readdir (was: sys-apps/systemd-236 fails to install: No such file or directory: b'/var/tmp/portage/sys-apps/systemd-236/image/usr/share/man/man3/sd_journal_seek_head.3')
Summary: btrfs - stale entries in readdir (was: sys-apps/systemd-236 fails to install:...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Kernel Bug Wranglers and Kernel Maintainers
URL: https://bugzilla.kernel.org/show_bug....
Whiteboard:
Keywords:
: 641264 645426 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-12-16 10:16 UTC by Mark Nowiasz
Modified: 2019-04-13 17:55 UTC (History)
17 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
build.log (build.log.bz2,95.51 KB, application/x-bzip)
2017-12-16 10:16 UTC, Mark Nowiasz
Details
Test for stale (removed) files in btrfs directory listings (btrfs-stale-dirent-test.sh,1.63 KB, application/x-shellscript)
2018-01-15 10:45 UTC, Zac Medico
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Nowiasz 2017-12-16 10:16:20 UTC
Created attachment 510312 [details]
build.log

When trying to update systemd, systemd compiles just fine - but fails to install /lib/systemd/systemd-sulogin-shell:

lib/systemd/systemd-sulogin-shell
ecompressdir: bzip2 -9 /usr/share/doc
ecompressdir: bzip2 -9 /usr/share/man
Traceback (most recent call last):
  File "/usr/lib64/python3.5/site-packages/portage/dbapi/_MergeProcess.py", line 235, in _spawn
    prev_mtimes=self.prev_mtimes, counter=counter)
  File "/usr/lib64/python3.5/site-packages/portage/dbapi/vartree.py", line 1704, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/lib64/python3.5/site-packages/portage/dbapi/vartree.py", line 5150, in merge
    counter=counter)
  File "/usr/lib64/python3.5/site-packages/portage/dbapi/vartree.py", line 3965, in treewalk
    file_mode = os.lstat(fpath).st_mode
  File "/usr/lib64/python3.5/site-packages/portage/__init__.py", line 250, in __call__
    rval = self._func(*wrapped_args, **wrapped_kwargs)
FileNotFoundError: [Errno 2] No such file or directory: b'/var/tmp/portage/sys-apps/systemd-236/image/usr/share/man/man3/sd_journal_seek_head.3'
Comment 1 Nils Holland 2017-12-16 16:16:28 UTC
Let me add some more details:

When this happens and one looks into /var/tmp/portage/sys-apps/systemd-236/image/usr/share/man/man3/, a few files will already have been bzip2'd (including the file mentioned in the error message, sd_journal_seek_head.3, which has become sd_journal_seek_head.3.bz2). However, the vast majority of manpage files in the directory are still uncompressed at the point the error message occurs.

As a workaround, I installed systemd-236 using the "build" command instead of "emerge", first going until the "install" point, then bzip2'ing the remaining files in /var/tmp/portage/sys-apps/systemd-236/image/usr/share/man/man3/ manually, and then doing the "merge". This works.

More strangely, I can reproduce this error consistently on one machine, while on another one, which is software-wise nearly identical (but slower), emerging systemd-236 via portage works just fine. Probably some kind of race condition could be the cause? Well, just a wild guess, as I have not really looked into things after working around the issue manually...
Comment 2 William 2017-12-17 12:40:12 UTC
> Probably some kind of race condition could be the cause?

Probably correct, I had the same problem and just retried in a loop and eventually it worked.
Comment 3 Mark Nowiasz 2017-12-18 06:48:15 UTC
(In reply to William from comment #2)
> > Probably some kind of race condition could be the cause?
> 
> Probably correct, I had the same problem and just retried in a loop and
> eventually it worked.

Same here - I finally installed after a couple of times (and possible because there were additional processes going on in the background, slowing the emerge process down).

Nevertheless emerging shouldn't be a game of chances :-/
Comment 4 William 2017-12-18 10:55:15 UTC
> Nevertheless emerging shouldn't be a game of chances :-/

No way, this isn't a fix.
I was surprised that -j1 didn't help.
Comment 5 Mike Gilbert gentoo-dev 2017-12-18 22:36:49 UTC
This looks like a portage bug to me. Please provide emerge --info.
Comment 6 Zac Medico gentoo-dev 2017-12-18 23:00:40 UTC
*** Bug 641264 has been marked as a duplicate of this bug. ***
Comment 7 Nils Holland 2017-12-19 00:52:47 UTC
(In reply to Mike Gilbert from comment #5)
> This looks like a portage bug to me. Please provide emerge --info.

Right, here it comes! As a side note, let me add than in addition to portage 2.3.19, which is visible in the below output, I also tried downgrading to the next lower release of portage that was in the tree (which happened to be 2.3.16) and then emerging systemd-236 again. However, the results were the same.

Let me also add that at least in my case, the fs on which all of this is happening is btrfs. None of the information we have so far seems to suggest that it's a fs issue though, but I thought I'd still mention it.

$ emerge --info sys-apps/systemd

Portage 2.3.19 (python 3.5.4-final-0, default/linux/x86/17.0/systemd, gcc-7.2.0, glibc-2.26-r3, 4.14.7-gentoo i686)
=================================================================
                         System Settings
=================================================================
System uname: Linux-4.14.7-gentoo-i686-Pentium-R-_Dual-Core_CPU_T4300_@_2.10GHz-with-gentoo-2.4.1
KiB Mem:     3961396 total,   3618452 free
KiB Swap:    3781628 total,   3781628 free
Timestamp of repository gentoo: Mon, 18 Dec 2017 06:45:01 +0000
Head commit of repository gentoo: 2cf2a65b9b897719ba0c3b376e85254146a1dc8f
sh bash 4.4_p12
ld GNU ld (Gentoo 2.29.1 p3) 2.29.1
app-shells/bash:          4.4_p12::gentoo
dev-lang/perl:            5.26.1-r1::gentoo
dev-lang/python:          2.7.14-r1::gentoo, 3.5.4-r1::gentoo
dev-util/cmake:           3.10.1::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.4.1-r2::gentoo
sys-apps/sandbox:         2.12::gentoo
sys-devel/autoconf:       2.13::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.15.1-r1::gentoo
sys-devel/binutils:       2.29.1-r1::gentoo
sys-devel/gcc:            7.2.0::gentoo
sys-devel/gcc-config:     1.9.1::gentoo
sys-devel/libtool:        2.4.6-r4::gentoo
sys-devel/make:           4.2.1-r1::gentoo
sys-kernel/linux-headers: 4.13::gentoo (virtual/os-headers)
sys-libs/glibc:           2.26-r3::gentoo
Repositories:

gentoo
    location: /usr/portage
    sync-type: rsync
    sync-uri: rsync://celine.tisys.org/gentoo-portage
    priority: 1
    sync-rsync-extra-opts: 

tisys
    location: /usr/local/portage
    masters: gentoo
    priority: 2

ACCEPT_KEYWORDS="x86 ~x86"
ACCEPT_LICENSE="* -@EULA"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O3 -march=native -mfpmath=sse -pipe -fomit-frame-pointer -fno-stack-protector"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib/libreoffice/program/sofficerc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.2/ext-active/ /etc/php/cgi-php7.2/ext-active/ /etc/php/cli-php7.2/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-O3 -march=native -mfpmath=sse -pipe -fomit-frame-pointer -fno-stack-protector"
DISTDIR="/usr/portage/distfiles"
FCFLAGS="-O2 -march=i686 -pipe"
FEATURES="assume-digests binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles merge-sync multilib-strict news parallel-fetch preserve-libs protect-owned sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans unprivileged userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -march=i686 -pipe"
GENTOO_MIRRORS="ftp://celine.tisys.org/pub/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j3"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X a52 aac acl acpi alsa berkdb bluetooth branding bzip2 cairo cdda cdr cli cracklib crypt cups cxx dbus dri dts dvd dvdr emboss encode exif fam firefox flac fortran gdbm gif glamor gpm gtk iconv introspection ipv6 jpeg lcms ldap libnotify mad mng modules mp3 mp4 mpeg ncurses nls nptl ogg opengl openmp pam pango pcre pdf png policykit ppds qt3support qt4 readline sdl seccomp session spell ssl startup-notification svg systemd tcpd tiff truetype udev udisks unicode upower usb vorbis wayland wxwidgets x264 x86 xattr xcb xml xv xvid zlib" ABI_X86="32" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="kexi words flow plan sheets stage tables krita karbon braindump author" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 sse3 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LLVM_TARGETS="BPF X86" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php5-6 php7-0" POSTGRES_TARGETS="postgres9_5" PYTHON_SINGLE_TARGET="python3_5" PYTHON_TARGETS="python2_7 python3_5" RUBY_TARGETS="ruby21 ruby23 ruby24" USERLAND="GNU" VIDEO_CARDS="i965" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LC_ALL, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

=================================================================
                        Package Settings
=================================================================

sys-apps/systemd-236::gentoo was built with the following:
USE="acl gcrypt kmod lz4 pam policykit seccomp ssl -apparmor -audit -build -cryptsetup -curl -elfutils -gnuefi -http -idn -importd -libidn2 -lzma -nat -qrcode (-selinux) -sysv-utils -test -usrmerge -vanilla -xkb"
Comment 8 Zac Medico gentoo-dev 2017-12-19 01:52:03 UTC
(In reply to Nils Holland from comment #1)
> When this happens and one looks into
> /var/tmp/portage/sys-apps/systemd-236/image/usr/share/man/man3/, a few files
> will already have been bzip2'd (including the file mentioned in the error
> message, sd_journal_seek_head.3, which has become
> sd_journal_seek_head.3.bz2). However, the vast majority of manpage files in
> the directory are still uncompressed at the point the error message occurs.

It's normal for ecompressdir to leave smaller files uncompressed, since compressing them does not save any space. For example, the other 3 sd_journal_seek_* files are all very small because they include the content of sd_journal_seek_head.3 like this:

   .so sd_journal_seek_head.3


(In reply to Nils Holland from comment #7)
> app-shells/bash:          4.4_p12::gentoo

Is everyone who's experiencing this problem using bash-4.4_p12?
Comment 9 Mark Nowiasz 2017-12-19 05:30:11 UTC
(In reply to Nils Holland from comment #7)

> Let me also add that at least in my case, the fs on which all of this is
> happening is btrfs. None of the information we have so far seems to suggest
> that it's a fs issue though, but I thought I'd still mention it.

This is interesting, because I'm using btrfs, too - on all my machines, and all of them have this problem. This could be a clue.

As to bash: 44_p12.

Regards
Mark
Comment 10 Zac Medico gentoo-dev 2017-12-19 06:06:17 UTC
Does the problem go away if you mount a different filesystem on /var/tmp/portage, like tmpfs?
Comment 11 Mark Nowiasz 2017-12-19 07:03:03 UTC
(In reply to Zac Medico from comment #10)
> Does the problem go away if you mount a different filesystem on
> /var/tmp/portage, like tmpfs?

Yes, indeed. After umount /var/tmp && mount -t tmpfs none /var/tmp emerge -1 systemd worked - on three of my machines, all of them having a) btrfs and b) /var/tmp/ mounted as a subvolume.

What's odd that this is the first and only package causing trouble concerning btrfs...
Comment 12 Mark Nowiasz 2017-12-19 08:01:35 UTC
Another headscratcher: systemd-236-r1 seems to install just fine: I've remounted /var/tmp to its original state (btrfs subvolume), emerge --sync & emerge -uDN, and no problem at all emerging systemd-236-r1. Odd.
Comment 13 Nils Holland 2017-12-19 09:05:09 UTC
(In reply to Mark Nowiasz from comment #12)
> Another headscratcher: systemd-236-r1 seems to install just fine: I've
> remounted /var/tmp to its original state (btrfs subvolume), emerge --sync &
> emerge -uDN, and no problem at all emerging systemd-236-r1. Odd.

For me, systemd-236-r1 behaved the same as -236: It failed on my faster machine and went through fine on the slower one. And because both of these machines use btrfs (including for /var/tmp), the involvement of btrfs certainly cannot be the only determining factor ... and it is only a guess whether it's a factor at all, I think.

Concerning Zac's question:

> Is everyone who's experiencing this problem using bash-4.4_p12?

Unfortunately, the slower machine I've already mentioned, which emerged both systemd-236 and systemd-236-r1 without any issues right at first try, is also using bash-4.4_p12 - just like the machine on which the process fails, and like Mark's machines. :-/
Comment 14 Zac Medico gentoo-dev 2017-12-19 09:15:25 UTC
(In reply to Nils Holland from comment #13)
> (In reply to Mark Nowiasz from comment #12)
> > Another headscratcher: systemd-236-r1 seems to install just fine: I've
> > remounted /var/tmp to its original state (btrfs subvolume), emerge --sync &
> > emerge -uDN, and no problem at all emerging systemd-236-r1. Odd.
> 
> For me, systemd-236-r1 behaved the same as -236: It failed on my faster
> machine and went through fine on the slower one. And because both of these
> machines use btrfs (including for /var/tmp), the involvement of btrfs
> certainly cannot be the only determining factor ... and it is only a guess
> whether it's a factor at all, I think.

A negative result, failure to reproduce the problem, only means that you haven't met _all_ of the conditions necessary to reproduce the problem. It doesn't prove that btrfs is bug-free.

If you can reproduce the problem only on btrfs, then that could mean something.
Comment 15 Nils Holland 2017-12-19 09:50:25 UTC
(In reply to Zac Medico from comment #14)
> (In reply to Nils Holland from comment #13)
> > (In reply to Mark Nowiasz from comment #12)
> > > Another headscratcher: systemd-236-r1 seems to install just fine: I've
> > > remounted /var/tmp to its original state (btrfs subvolume), emerge --sync &
> > > emerge -uDN, and no problem at all emerging systemd-236-r1. Odd.
> > 
> > For me, systemd-236-r1 behaved the same as -236: It failed on my faster
> > machine and went through fine on the slower one. And because both of these
> > machines use btrfs (including for /var/tmp), the involvement of btrfs
> > certainly cannot be the only determining factor ... and it is only a guess
> > whether it's a factor at all, I think.
> 
> A negative result, failure to reproduce the problem, only means that you
> haven't met _all_ of the conditions necessary to reproduce the problem. It
> doesn't prove that btrfs is bug-free.
> 
> If you can reproduce the problem only on btrfs, then that could mean
> something.

And in fact: I've now also done another test on my machine on which it previously always failed, by making /var/tmp/portage tmpfs and trying to emerge systemd-236-r1 again. And indeed: It worked right away, and also a second time right after. This is, in fact, the first time either systemd-236 or -236-r1 have successfully emerged on that machine (except when I manually did it in "single steps" via the "ebuild" command instead).

So yes ... following logic, it would now be interesting if someone can reproduce the problem on something else than btrfs. If not, it might indeed be the major factor here. At least as the situation currently stands, every instance of the problem seems to have happened on btrfs.
Comment 16 Mark Nowiasz 2017-12-19 10:01:06 UTC
> For me, systemd-236-r1 behaved the same as -236: It failed on my faster
> machine and went through fine on the slower one. And because both of these
> machines use btrfs (including for /var/tmp), the involvement of btrfs
> certainly cannot be the only determining factor ... and it is only a guess
> whether it's a factor at all, I think.

Unfortunately, I was too hasty - it did fail on the *slower* of the machines I tested. But it did succeed when using tmpfs (instead of btrfs) on /var/tmp

So the only common determinator seems to be btrfs - the speed of the CPU doesn't seem to matter, in your case it did fail on a faster machine, in my case it's the other way around and so on. But btrfs isn't the only factor...

Well, I must admit, it's an interesting puzzle :-)
Comment 17 William 2017-12-19 14:14:14 UTC
Heh, I'm also using btrfs.
Comment 18 Andrew Telford 2017-12-19 15:07:04 UTC
Am am also seeing this bug on btrfs on my nvme root drive, both with 236 and 236-r1.  So far I have only managed to get 236 to compile once after multiple attempts.
Comment 19 Markus Rathgeb 2017-12-19 19:19:40 UTC
Same here on btrfs.

If emerge failed, I continue using ebuild ... compile, ... install, ... qmerge and it succeeded all the time (I tried it).
Comment 20 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-12-19 22:39:50 UTC
I have two systems with /var/tmp/portage on btrfs (one is using a separate fs, on the other it's just regular directory on btrfs root) and I have never seen any problem like this. What kernel version are you using?
Comment 21 Nils Holland 2017-12-19 23:22:25 UTC
(In reply to Michał Górny from comment #20)
> I have two systems with /var/tmp/portage on btrfs (one is using a separate
> fs, on the other it's just regular directory on btrfs root) and I have never
> seen any problem like this. What kernel version are you using?

In my case:

Machine where things fail:

nils@boerne (GCC7) ~ $ uname -a
Linux boerne 4.14.7-gentoo #2 SMP Mon Dec 18 11:37:40 CET 2017 i686 Pentium(R) Dual-Core CPU T4300 @ 2.10GHz GenuineIntel GNU/Linux

Machine where things work:

nils@teela (GCC7) ~ $ uname -a
Linux teela 4.14.7-gentoo #2 SMP Mon Dec 18 13:45:58 CET 2017 i686 AMD E1-2100 APU with Radeon(TM) HD Graphics AuthenticAMD GNU/Linux

So in both cases, it's sys-kernel/gentoo-sources, currently 4.14.7, but 4.14.6 a few days ago had the same behavior.

Kernel configuration is custom - in case it is of interest, I have put the .config of the failing machine up at

http://ftp.tisys.org/pub/misc/boerne-config.txt
Comment 22 Mark Nowiasz 2017-12-20 06:45:06 UTC
(In reply to Michał Górny from comment #20)
> I have two systems with /var/tmp/portage on btrfs (one is using a separate
> fs, on the other it's just regular directory on btrfs root) and I have never
> seen any problem like this. What kernel version are you using?

I'm using 4.14.4/4.14.6/4.14.7 and gcc 7.2.0. In all cases (4.14.4/4.14.6/4.14.7) the problem occurrs.
Comment 23 William 2017-12-20 12:44:10 UTC
gentoo sources 4.14.4 and 4.14.6, gcc 7.2.0, bash 4.4_p12, portage 2.3.19. 
The partition is mounted with the following options:
/dev/sda4 on / type btrfs (rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache,subvolid=1175,subvol=/__active/ROOT)

Anything else?
Comment 24 Mike Gilbert gentoo-dev 2017-12-22 16:24:28 UTC
Might be interesting to see if you can reproduce this with an older kernel. Maybe something like 4.9.y.
Comment 25 Johannes Hirte 2017-12-22 20:22:29 UTC
Same here with linux-4.15-rc3 and btrfs. I've not noticed this before, because /var/tmp/portage normally is a tmpfs. Another workaround was using a file formated with ext4 and loop-mounted on /var/tmp/portage.
Comment 26 Sebastian Pucilowski 2017-12-23 23:28:57 UTC
Also hit this bug with 4.14.7-gentoo and btrfs.
Comment 27 Johannes Hirte 2017-12-24 21:23:25 UTC
(In reply to Johannes Hirte from comment #25)
> Same here with linux-4.15-rc3 and btrfs. I've not noticed this before,
> because /var/tmp/portage normally is a tmpfs. Another workaround was using a
> file formated with ext4 and loop-mounted on /var/tmp/portage.

Strangely enough, with btrfs on the loop-file this works too.
Comment 28 Mark Nowiasz 2017-12-27 08:38:18 UTC
(In reply to Johannes Hirte from comment #27)
> (In reply to Johannes Hirte from comment #25)
> > Same here with linux-4.15-rc3 and btrfs. I've not noticed this before,
> > because /var/tmp/portage normally is a tmpfs. Another workaround was using a
> > file formated with ext4 and loop-mounted on /var/tmp/portage.
> 
> Strangely enough, with btrfs on the loop-file this works too.

Could it be that the bug only hits if btrfs is mounted as a subvolume?
Comment 29 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-12-31 09:12:37 UTC
Ok, it just occurred to me that I've been hitting this on my third system (pure x86, btrfs root, no subvolumes), except in my case it was 'tar' failing to pack the binary package:

tar: ./usr/share/man/man3/sd_journal_seek_head.3: Plik usunięty zanim został przeczytany

[=> file removed before read]

Curious enough, I've just tried rebuilding while watching the directory via inotify and it did not fail (possibly because of inotify making it slow).

$ ls /usr/share/man/man3/sd_journal_seek_head.3*
/usr/share/man/man3/sd_journal_seek_head.3.lz


Which means the file is indeed not supposed to exist. So it looks to be a race condition. FWICS, someone turned ecompressdir into awful multiprocessing bash hack that doesn't wait properly and emerge proceeds with installing while ecompressdir is still compressing manpages.
Comment 30 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2017-12-31 09:29:31 UTC
Hmm, I was a bit wrong there. That awful hack is used only to decompress already compressed files, so it is irrelevant here. Still, I suspect it's something in the awfully complex Portage logic and btrfs just happens to make it more likely to happen (e.g. because of being slow/fast).
Comment 31 Nils Holland 2018-01-01 00:11:50 UTC
(In reply to Michał Górny from comment #30)
> Hmm, I was a bit wrong there. That awful hack is used only to decompress
> already compressed files, so it is irrelevant here. Still, I suspect it's
> something in the awfully complex Portage logic and btrfs just happens to
> make it more likely to happen (e.g. because of being slow/fast).

Well, if this is really a kind of race condition issue resulting from the combination of portage and btrfs, I would suspect that it'd be pretty unlikely for the issue to only surface in context of systemd-236. One would instead expect similar behavior when emerging other ebuilds as well. However, ever since I first experienced the issue with systemd-236, I've done daily @world updates and never experienced anything similar with other packages. At the beginning on December, I rebuild my complete systems as a result of switching to the new 17.0 profiles, also without any issues. In other words ... strange!

However, now I might have stumbled across something that might be the same / a similar issue in the context of something else than systemd-236. However, it is only a wild guess that it is related, so I'm sorry if the following turns out to be unrelated and points into the wrong direction. Still, I think it might be worthwhile to report it here:

So, tonight I was using catalyst-3.0.1 to build an installcd-stage1 on my x86 machine that also has the systemd-236 issue this bug report deals with. As part of that catalyst operation, app-accessibility/brltty-5.2-r1 is being built, and since I have the option that instructs catalyst to create binary packages of the stuff it builds enabled, the last step of emerging each package is just this creation of the binary package. And in two attempts, this very process failed with an error message that looks at least a bit similar to what we've been seeing with systemd-236, and which also suggests that it might be caused by some kind of race condition. Namely...:

set -- Linux Screen; \
for driver do (cd ./../Drivers/Screen/$driver && make install); done
make[2]: Entering directory '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2/Drivers/Screen/Linux'
make[2]: Nothing to be done for 'install'.
make[2]: Leaving directory '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2/Drivers/Screen/Linux'
make[2]: Entering directory '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2/Drivers/Screen/Screen'
make[2]: Nothing to be done for 'install'.
make[2]: Leaving directory '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2/Drivers/Screen/Screen'
make[1]: Leaving directory '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2/Programs'
 * Final size of build directory: 74196 KiB (72.4 MiB)
 * Final size of installed tree:      8 KiB
tar: ./usr/share/man/man3/brlapi_tty.3: File removed before we read it
 * ERROR: app-accessibility/brltty-5.2-r1::gentoo failed (package phase):
 *   failed to pack binary package: '/var/gentoo/packages/app-accessibility/brltty-5.2-r1.tbz2.2438'
 *
 * Call stack:
 *       misc-functions.sh, line 666:  Called __dyn_package
 *       misc-functions.sh, line 546:  Called assert 'failed to pack binary package: '/var/gentoo/packages/app-accessibility/brltty-5.2-r1.tbz2.2438''
 *   isolated-functions.sh, line  16:  Called die
 * The specific snippet of code:
 *              [[ $x -eq 0 ]] || die "$@"
 *
 * If you need support, post the output of `emerge --info '=app-accessibility/brltty-5.2-r1::gentoo'`,
 * the complete build log and the output of `emerge -pqv '=app-accessibility/brltty-5.2-r1::gentoo'`.
!!! When you file a bug report, please include the following information:
GENTOO_VM=  CLASSPATH="" JAVA_HOME=""
JAVACFLAGS="" COMPILER=""
and of course, the output of emerge --info =brltty-5.2
 * The complete build log is located at '/var/tmp/portage/app-accessibility/brltty-5.2-r1/temp/build.log'.
 * The ebuild environment file is located at '/var/tmp/portage/app-accessibility/brltty-5.2-r1/temp/environment'.
 * Working directory: '/var/tmp/portage/app-accessibility/brltty-5.2-r1/temp'
 * S: '/var/tmp/portage/app-accessibility/brltty-5.2-r1/work/brltty-5.2'

I guess

"tar: ./usr/share/man/man3/brlapi_tty.3: File removed before we read it"

looks kind of suspicious. Unfortunately, I haven't yet been able to look into this more and probably do some more tests with regard to app-accessibility/brltty-5.2-r1. I might be able to do that later, but for the moment, all I can say is that this *might* be a similar issue to what we've been talking about here so far.
Comment 32 Zac Medico gentoo-dev 2018-01-02 02:04:38 UTC
(In reply to William from comment #23)
> gentoo sources 4.14.4 and 4.14.6, gcc 7.2.0, bash 4.4_p12, portage 2.3.19. 
> The partition is mounted with the following options:
> /dev/sda4 on / type btrfs
> (rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache,
> subvolid=1175,subvol=/__active/ROOT)
> 
> Anything else?

I wonder if the problem is related to any of these mount options?

If you have transparent compression enabled, then you might want to disable man page compression by setting PORTAGE_COMPRESS="" in make.conf.
Comment 33 Mark Nowiasz 2018-01-05 07:50:13 UTC
Strange: 236-r4 installs without the aforementioned problems on all of my systemd machines.
Comment 34 Zac Medico gentoo-dev 2018-01-15 00:30:20 UTC
Apparently this is a legitimate btrfs bug. I just noticed the same issue today on a system with a linux 4.14.13 kernel and mount options noatime,compress=zlib,discard,space_cache. In my case the issue was triggered by an rm command followed by an rsync command. The rsync command reported that one of the files I removed "has vanished", even though I ran rsync *after* the rm command had completed.
Comment 35 Johannes Hirte 2018-01-15 08:44:15 UTC
(In reply to Zac Medico from comment #32)
> (In reply to William from comment #23)
> > gentoo sources 4.14.4 and 4.14.6, gcc 7.2.0, bash 4.4_p12, portage 2.3.19. 
> > The partition is mounted with the following options:
> > /dev/sda4 on / type btrfs
> > (rw,noatime,compress=lzo,ssd,discard,space_cache,autodefrag,inode_cache,
> > subvolid=1175,subvol=/__active/ROOT)
> > 
> > Anything else?
> 
> I wonder if the problem is related to any of these mount options?
> 
> If you have transparent compression enabled, then you might want to disable
> man page compression by setting PORTAGE_COMPRESS="" in make.conf.

The two affected systems on my side have the following mount options:

rw,noatime,ssd,space_cache,autodefrag,subvolid=257,subvol=/rootfs

and 

rw,noatime,space_cache,autodefrag,subvolid=257,subvol=/rootfs

So it seems unrelated to any mount option. Can you provide a simple reproducer with your rm & rsync tests?
Comment 36 Zac Medico gentoo-dev 2018-01-15 10:45:50 UTC
Created attachment 514864 [details]
Test for stale (removed) files in btrfs directory listings

The attached script reliably reproduces the problem on a couple of my machines. By default, it creates files numbered 1 to 350, and then removes every 10th file. On the two machines I've tested, the file named 341 consistently shows up as a stale entry.
Comment 37 Nils Holland 2018-01-15 11:34:59 UTC
(In reply to Zac Medico from comment #36)
> Created attachment 514864 [details]
> Test for stale (removed) files in btrfs directory listings

Great, thanks!

> On the two machines I've tested, the file named 341
> consistently shows up as a stale entry.

Same here on my machine, and interestingly its also file 341 that shows up as stale here. Now, this is the result when tested on my machine on which I was also experiencing the issue this bug report originally deals with during the systemd-236 merge. I couldn't yet run your test case on my machine on which systemd-236 merged fine, as I'm currently at work and that machine's at home and powered off.

Being at work, however, I was able to throw the testcase on a completely different (non-Gentoo) machine that is using btrfs and running a 4.1.12 kernel. It seems at first glance that the issue does not occur on that machine, although I have to admit that I haven't played with the testcase's parameters much, so this observation might be inconclusive, but it might suggest that it only slipped in during more recent kernel releases.

Anyway ... by now it'd probably be time to report the problem, along with the testcase, to the kernel's btrfs folks?
Comment 38 Zac Medico gentoo-dev 2018-01-15 12:04:12 UTC
Reported upstream as https://bugzilla.kernel.org/show_bug.cgi?id=198483.
Comment 39 Michael Jones 2018-01-23 18:17:44 UTC
*** Bug 645426 has been marked as a duplicate of this bug. ***
Comment 40 Mike Gilbert gentoo-dev 2018-01-23 22:11:04 UTC
A kernel patch has been posted for review upstream, if anyone would like to test it.

https://patchwork.kernel.org/patch/10181019/
Comment 41 Zac Medico gentoo-dev 2018-01-26 22:59:37 UTC
Linus has merged the patch:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=e4fd493c0541d36953f7b9d3bfced67a1321792f

It's not in v4.14.15 but hopefully will be in the next one.
Comment 44 Mike Gilbert gentoo-dev 2018-02-07 18:36:33 UTC
I guess we can call this fixed then.