Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 490676

Summary: sys-apps/systemd: reaping /var/tmp content may lead to unexpected failures
Product: Gentoo Linux Reporter: Fabio Erculiani (RETIRED) <lxnay>
Component: Current packagesAssignee: Gentoo systemd Team <systemd>
Status: RESOLVED FIXED    
Severity: normal CC: alexander, josef64, marcec, marduk
Priority: High    
Version: unspecified   
Hardware: All   
OS: Linux   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=561404
https://bugs.gentoo.org/show_bug.cgi?id=603222
https://bugs.gentoo.org/show_bug.cgi?id=643386
https://bugs.gentoo.org/show_bug.cgi?id=900220
Whiteboard:
Package list:
Runtime testing required: ---

Description Fabio Erculiani (RETIRED) gentoo-dev 2013-11-07 10:14:45 UTC
/usr/lib/tmpfiles.d/tmp.conf contains the following tmpfiles.d statement:

d /var/tmp 1777 root root 30d

This means that files and directories in /var/tmp are automatically reaped by systemd-tmpfiles-clean (which runs daily IIRC) if they're older than 30 days.

This breaks any kind of things that deal with tarballs handling (Portage!) or ISO images creation, just to name two things that I personally run.

Basically, what happens is that emerge may be compiling, or even merging files to the live filesystem while systemd-tmpfiles-clean kicks in and wipes all the data.
This may lead to tremendous effects... There is no need to state some.

I really think that systemd shouldn't touch /var/tmp at all.
Comment 1 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-11-07 10:21:23 UTC
Then how should it be cleaned? I've noticed the issue long ago when I kept my CVS checkout in /var/tmp -- and obviously the answer was that this was a bad location from day one.

However, I didn't think of unpacked tarballs. I'd personally prefer just running the wipe at boot but some people just dislike rebooting. I don't see a good way to satisfy all parties except for making portage unpack sources somewhere else. Of course, then we lose the ability of cleaning it automatically...
Comment 2 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-11-07 10:22:17 UTC
As a side note, I think the tmpfiles.d format supports excludes, and we could make one for /var/tmp/portage.
Comment 3 Fabio Erculiani (RETIRED) gentoo-dev 2013-11-07 10:29:02 UTC
Protecting /var/tmp/portage seems to be the bare minimum.
Comment 4 Mike Gilbert gentoo-dev 2013-11-07 18:23:21 UTC
I just want to point out that systemd-tmpfiles determines a file's age base on the maximum of atime and mtime. So, this is only a problem if you have noatime enabled on the filesystem.
Comment 5 Mike Gilbert gentoo-dev 2013-11-07 18:26:42 UTC
Actually, it doesn't even seem to be a problem with noatime, since an atime is still set when you create a file.

Have you actually encountered this problem on a real system, or is this just a passing thought you had?
Comment 6 Fabio Erculiani (RETIRED) gentoo-dev 2013-11-07 18:55:02 UTC
I had two failures that may be related to what systemd-tmpfiles-clean does, one with emerge and the other one with one of my ISO build tools.
I have been able to reproduce the latter one only once.

There must be some race condition somewhere and I am trying to build a test case.
Comment 7 Fabio Erculiani (RETIRED) gentoo-dev 2013-11-07 19:55:08 UTC
So, I don't know if this is the problem, but read on.
Look at [1].

The basic idea is that package manager wants to preserve mtime and perhaps atime. I don't know if Portage preserves both, but sure enough, Python's tarfile module does this by default.
If app.foo does mkdir("/var/tmp/x") followed by utime("/var/tmp/x", (mtime, mtime)), then systemd-tmpfiles-clean is scheduled and only checks mtime and atime (ctime is changed anyway when the dentry in the inode changes, -> you add or remove a file in the dir, for instance) and decides to remove /var/tmp/x.
Then, app.foo is scheduled again and tries to do something inside /var/tmp/x, like creating a file, but the dir has been already removed and everything crashes.

Does it make sense to you? I haven't tried to reproduce it. I've just started reading the code at [1].

[1] http://cgit.freedesktop.org/systemd/systemd/tree/src/tmpfiles/tmpfiles.c#n356
Comment 8 Mike Gilbert gentoo-dev 2013-11-07 20:42:58 UTC
(In reply to Fabio Erculiani from comment #7)

If there really is code that sets atime = mtime, then yes, that seems feasible.
Comment 9 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2013-11-08 21:35:58 UTC
Just for the record: if someone commits this, please try to put it in gentoo-systemd-integration and not hack the upstream-supplied files.
Comment 10 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-10-30 09:14:58 UTC
Any update on this?
Comment 11 Pacho Ramos gentoo-dev 2014-10-30 12:53:39 UTC
I haven't ever being able to reproduce this :/, also, I don't see how can it break for noatime users (I use it for all for ages... but for /var/tmp/portage I use tmpfs, then...). With noatime files should be removed if they haven't being *modified* for 30 days and, in that case, I would consider its removal more like a feature than a fault as it would save me from manually need to remove old compilation files from failed mergings.
Comment 12 Pacho Ramos gentoo-dev 2014-11-11 09:20:16 UTC
On the other hand, I think we should exclude /var/tmp/ccache as it's entirely handled on its own depending on our configuration
Comment 13 Pacho Ramos gentoo-dev 2015-09-11 09:10:14 UTC
I have just seen with "ls -lR /var/tmp/ccache" that, indeed, it contains no files older than 1 month there :S, then, we should probably exclude /var/tmp/ccache and /var/tmp/portage from that automatic removal after 30d (ccache has its own mechanisms to control the space it uses)
Comment 14 Mike Gilbert gentoo-dev 2015-09-11 13:02:39 UTC
(In reply to Pacho Ramos from comment #12)
> On the other hand, I think we should exclude /var/tmp/ccache as it's
> entirely handled on its own depending on our configuration

I would suggest adding a tmpfiles entry in the dev-util/ccache ebuild.

x /var/tmp/cache
Comment 15 Mike Gilbert gentoo-dev 2015-09-11 13:08:10 UTC
(In reply to Michał Górny from comment #10)
> Any update on this?

I'm not making any changes until someone gives me a real usage scenario where this cleanup rule is harmful.

To that end, marking this WORKSFORME until someone proves otherwise.
Comment 16 SpanKY gentoo-dev 2015-09-24 19:15:40 UTC
it seems like pruning random files in random subdirs is a bad idea regardless.  generally the whole point of having a subdir is to put related objects in there.  when you start cleaning out random parts, things are bound to fail.

why is /var/tmp/ treated differently from /tmp/ ?
Comment 17 Mike Gilbert gentoo-dev 2015-09-24 19:20:58 UTC
(In reply to SpanKY from comment #16)
> why is /var/tmp/ treated differently from /tmp/ ?

/tmp is cleaned even more aggressively than /var/tmp; mainly because it is mounted as a tmpfs by default.
Comment 18 SpanKY gentoo-dev 2015-09-24 20:55:12 UTC
(In reply to Mike Gilbert from comment #17)

that sounds even worse.  if i have a daemon that creates a tempdir, populates a few files in there, and then keeps running, i have to worry about any of those getting yanked out from underneath me even while it's running ?  that's asinine.

/tmp is not a transcendent memory storage, and this seems to violate FHS.  while clearing of a program's temp files are permitted across invocations, it does not allow for clearing while that program is running.
Comment 19 Mike Gilbert gentoo-dev 2015-09-24 21:30:49 UTC
(In reply to SpanKY from comment #18)
> that sounds even worse.  if i have a daemon that creates a tempdir,
> populates a few files in there, and then keeps running, i have to worry
> about any of those getting yanked out from underneath me even while it's
> running ?  that's asinine.

I disagree; if your daemon does not access the files for over 10 days, it seams reasonable to assume they are unneeded.

> /tmp is not a transcendent memory storage, and this seems to violate FHS. 
> while clearing of a program's temp files are permitted across invocations,
> it does not allow for clearing while that program is running.

As far as I can tell, FHS does not prohibit it either.
Comment 20 Mike Gilbert gentoo-dev 2015-09-24 21:32:28 UTC
Also, the sysadmin is free to override these settings if he so chooses. These are simply defaults that are shipped by systemd upstream.
Comment 21 SpanKY gentoo-dev 2015-09-25 05:23:15 UTC
(In reply to Mike Gilbert from comment #19)

except that's not how the fs works.  if a daemon generates a file and then only reads it, the mtime/ctime aren't going to be updated once it's been generated, nor will the atime.  Linux defaults to relatime, and historically many distros (us included) highly recommended/documented using noatime everywhere.

FHS does not explicitly prohibit it, but the intention is pretty clear.  when it says:
  Programs must not assume that any files or directories in /tmp are preserved
  between invocations of the program.
any reasonable person is going to interpret that files will stick around while their program continues to run.

more succinctly, if someone filed a report:
 - i launched daemon FOO
 - i periodically run `rm /tmp/xxx` on files that FOO is actively using
 - FOO crashes/fails
no one is going to reasonably say that's a bug in FOO -- it's purely pebkac.  the default behavior of systemd is to do exactly that.

and no, the logic "a system admin can override them" isn't acceptable.  if that's the case, then i can simply punt bug 561404 with that excuse -- if you cared about the data in the ccache dir, then clearly you would have written a rule to exempt it.
Comment 22 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-09-25 06:15:04 UTC
I'm with vapier here. I didn't know tmpfiles used to run daily -- I think we should really limit it to bootup.
Comment 23 Richard Freeman gentoo-dev 2015-09-25 11:05:35 UTC
How is this different from the default behavior of tmpreaper?
Comment 24 Mike Gilbert gentoo-dev 2015-09-25 12:31:23 UTC
(In reply to SpanKY from comment #21)
> Linux defaults to relatime, and historically
> many distros (us included) highly recommended/documented using noatime
> everywhere.

Ah, you make a good point with noatime/relatime.

It still seems kind of unlikely that a process would create and close a temp file that needs to persist for a long period of read-only access.

ccache does seem to fit that profile. I do wish we/they had picked a better default location than a temp directory for their cache files though.
Comment 25 Mike Gilbert gentoo-dev 2015-09-25 13:29:35 UTC
I'm ok with disabling cleanup on /var/tmp to prevent breakage of ccache and due to the noatime/relatime possibility.

/tmp is a bit less clear: systemd mounts this as a tmpfs.

[Mount]
What=tmpfs
Where=/tmp
Type=tmpfs
Options=mode=1777,strictatime

The noatime/relatime problem does not apply due to the strictatime mount option.

As well, it can grow to 50% of physical memory; it would be nice to remove stale files to clean up memory.

Given this, does it make sense to leave the /tmp cleaning in place? Or do we really need to cater to processes that create temp files and then don't touch them for > 10 days?
Comment 26 SpanKY gentoo-dev 2015-09-25 13:45:27 UTC
(In reply to Mike Gilbert from comment #24)

it doesn't seem unreasonable for a daemon to generate a cache file that is then read-mostly, or for long running computing programs to generate partial states as it goes.  it might not be optimal, but it does seem unreasonable that people now have to adjust their scripts to run `touch` on their files lest they be reaped while it's still running.

wrt ccache, the default is ~/.ccache/.  /var/tmp comes into play via portage: we've set PORTAGE_TMPDIR=/var/tmp (because FHS says /var/tmp survives reboots), and then portage defaults ccache off of that.  it's probably the best location out of the other options (/var/cache and /var/lib and ~/ paths seem less appropriate).

along those lines, portage unpacks tarballs (source & binpkgs) in there which can have old [mtime] timestamps, although ctime should be set to $now.  off the top of my head i can't think of a way where ctime would also be old, but it does make me uneasy.

is there a rash of cases where /tmp and /var/tmp are filling up on people such that there's enough of a concern to start trimming data behind the scenes ?  even then, these settings don't immediately resolve such problems ... if a process filled /tmp, then you'd have to wait 10 days for it to auto-fix itself, and that assumes it doesn't keep filling it.  at that point, i'd expect an admin to already have manually resolved the issue.
Comment 27 Mike Gilbert gentoo-dev 2015-09-25 14:53:31 UTC
I have disabled cleaning of /tmp and /var/tmp in systemd 218-r4, 226-r1, and 9999.

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=884081f76bfb615b4ff37f2cbebe02195a94d6d6
Comment 28 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2015-09-25 15:27:45 UTC
Not that i find this really important but i think this would better fit in gentoo-systemd-integration. I guess you forgot my old comment about that.
Comment 29 Mike Gilbert gentoo-dev 2015-09-25 17:24:08 UTC
(In reply to Michał Górny from comment #28)
> Not that i find this really important but i think this would better fit in
> gentoo-systemd-integration. I guess you forgot my old comment about that.

I saw the comment, and decided against it.
Comment 30 SpanKY gentoo-dev 2015-09-25 18:52:10 UTC
another thought: unless systemd is walking all mount namespaces, this reaping doesn't help all that much wrt keeping memory from filling up.  it's also a bit of a fool's errand as deleting the file doesn't actually reclaim the memory until all handles to it are closed.

any unprivileged user can easily:
(1) create a user ns
(2) map their active uid to root
(3) create a mount ns
(4) mount a new tmpfs at /tmp in their mount ns
(5) fill it with cruft thus using up memory
(6) never exit

to see this locally, simply run:
$ unshare -mUr --propagation private
# mount -t tmpfs tmpfs /tmp
# ls /tmp

on the upside, when the process *does* exit, all their temp files will automatically be reaped by the kernel when the mount ns is destroyed.  so if the goal is not to limit memory usage (that's really what cgroups are for) but to keep processes from leaking temp files, it looks like we have a solution already available, and systemd itself could leverage it whenever it spawns a new daemon/cronjob/whatever via its unit/whatever files.

i'm in the process of deploying this exact mechanism in sandbox so that every package build has a unique/clean/reaped /tmp dir.
Comment 31 Mike Gilbert gentoo-dev 2015-09-25 19:32:01 UTC
(In reply to SpanKY from comment #30)
> on the upside, when the process *does* exit, all their temp files will
> automatically be reaped by the kernel when the mount ns is destroyed.  so if
> the goal is not to limit memory usage (that's really what cgroups are for)
> but to keep processes from leaking temp files, it looks like we have a
> solution already available, and systemd itself could leverage it whenever it
> spawns a new daemon/cronjob/whatever via its unit/whatever files.

systemd already supports this.

http://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateTmp=
Comment 32 Richard Freeman gentoo-dev 2015-09-25 19:51:55 UTC
(In reply to Mike Gilbert from comment #31)
> (In reply to SpanKY from comment #30)
> > on the upside, when the process *does* exit, all their temp files will
> > automatically be reaped by the kernel when the mount ns is destroyed.  so if
> > the goal is not to limit memory usage (that's really what cgroups are for)
> > but to keep processes from leaking temp files, it looks like we have a
> > solution already available, and systemd itself could leverage it whenever it
> > spawns a new daemon/cronjob/whatever via its unit/whatever files.
> 
> systemd already supports this.
> 
> http://www.freedesktop.org/software/systemd/man/systemd.exec.html#PrivateTmp=

Indeed, and from my experience most services tend to use this already.

The private tmp dirs by default are just sub-directories of /tmp.  I wouldn't be surprised if this was configurable.
Comment 33 SpanKY gentoo-dev 2015-09-25 20:46:03 UTC
(In reply to Mike Gilbert from comment #31)

then i'm even more mystified as to why the tmpfiles.d reaping is even bothered with, especially when it only impacts daemons/scripts that either (1) have specifically not opted in or (2) are written w/out this knowledge of possible cleaning or (3) are often legacy in nature in which case updating them in the face of this isn't that realistic of an option.

but maybe we've kicked this horse/glue enough at this point and we just move on to the Next Big Shed ;).
Comment 34 Marc Joliet 2015-09-25 22:58:45 UTC
It seems it was overlooked that the "v" (and "V") type was added in systemd-219 (see http://cgit.freedesktop.org/systemd/systemd/tree/NEWS).  Sure enough, after upgrading to systemd-218-r4 and rebooting I get this:

# systemctl --failed
  UNIT                               LOAD   ACTIVE SUB    DESCRIPTION
● systemd-tmpfiles-setup-dev.service loaded failed failed Create Static Device Nodes in /dev
● systemd-tmpfiles-setup.service     loaded failed failed Create Volatile Files and Directories

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

I have to ask: was a direct-to-stable revbump really warranted?

But on a different not, while I'm commenting here anyway: what about stuff like /var/tmp/kdecache-*?  Does KDE clean those directories by itself, or will they now grow indefinitely?
Comment 35 Mike Gilbert gentoo-dev 2015-09-26 01:54:07 UTC
commit 8595c126a7159621855791860b74f7d40b7eeed0
Author: Mike Gilbert <floppym@gentoo.org>
Date:   Fri Sep 25 21:52:46 2015 -0400

    sys-apps/systemd: Fix noclean-tmp patch for 218
    
    Package-Manager: portage-2.2.21_p119
Comment 36 Mike Gilbert gentoo-dev 2015-09-26 02:02:16 UTC
(In reply to Marc Joliet from comment #34)

Sorry for the log spam, but nothing really broke here; your system already has /tmp and /var/tmp, so the tmpfiles entry that creates them is a bit redundant. Regardless, I missed it when cherry-picking the patch from my own branch.
Comment 37 Albert W. Hopkins 2015-09-26 20:55:24 UTC
(In reply to Mike Gilbert from comment #36)
> (In reply to Marc Joliet from comment #34)
> 
> Sorry for the log spam, but nothing really broke here; your system already
> has /tmp and /var/tmp, so the tmpfiles entry that creates them is a bit
> redundant. Regardless, I missed it when cherry-picking the patch from my own
> branch.

I'm not certain that Marc Joliet's comment is completely invalid.  For example, I just upgraded a system to 218-r4 and now it always boots in degraded mode with the same two services in a failed state.  Because they are failed, any services that depend on them finishing will not get started.
Comment 38 Marc Joliet 2015-09-26 23:16:17 UTC
(In reply to Albert W. Hopkins from comment #37)
> (In reply to Mike Gilbert from comment #36)
> > (In reply to Marc Joliet from comment #34)
> > 
> > Sorry for the log spam, but nothing really broke here; your system already
> > has /tmp and /var/tmp, so the tmpfiles entry that creates them is a bit
> > redundant. Regardless, I missed it when cherry-picking the patch from my own
> > branch.
> 
> I'm not certain that Marc Joliet's comment is completely invalid.  For
> example, I just upgraded a system to 218-r4 and now it always boots in
> degraded mode with the same two services in a failed state.  Because they
> are failed, any services that depend on them finishing will not get started.

Well, speaking only for my system: it had no other failed services, and I could log into KDE just fine, audio still worked, etc..

(The thing that worried me was whether that was random or not, i.e., does a failure in one tmpfiles.d file cause systemd-tmpfiles to stop further processing or not?  I *expect* that to not be the case, but don't actually know.)

But whatever, it's fixed now :) .
Comment 39 Mike Gilbert gentoo-dev 2017-12-31 15:13:20 UTC
As a heads-up to anyone following this bug: I have re-reviewed the possible failure modes here, and decided to reinstate the 10 day / 30 day cleanup in sys-apps/systemd-236-r4.
Comment 40 Larry the Git Cow gentoo-dev 2018-01-02 02:49:08 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=e5f430619b1dbdce1d9ba7836db55fee065798dd

commit e5f430619b1dbdce1d9ba7836db55fee065798dd
Author:     Mike Gilbert <floppym@gentoo.org>
AuthorDate: 2017-12-31 15:18:27 +0000
Commit:     Mike Gilbert <floppym@gentoo.org>
CommitDate: 2018-01-02 02:48:06 +0000

    sys-apps/portage: exclude /var/tmp/ccache from tmpfiles cleanup
    
    By default, systemd-tmpfiles removes files older than 30 days from /var/tmp.
    The default portage config sets CCACHE_DIR=/var/tmp/ccache.
    
    Bug: https://bugs.gentoo.org/490676#c14
    Package-Manager: Portage-2.3.19_p3, Repoman-2.3.6_p37

 sys-apps/portage/files/portage-ccache.conf                           | 2 ++
 sys-apps/portage/{portage-2.3.19.ebuild => portage-2.3.19-r1.ebuild} | 4 +++-
 sys-apps/portage/portage-9999.ebuild                                 | 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)}
Comment 41 Larry the Git Cow gentoo-dev 2020-09-06 17:35:28 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=2af9e69f3462abfcf97679a24897b708f522059e

commit 2af9e69f3462abfcf97679a24897b708f522059e
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-09-06 17:16:26 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-09-06 17:35:20 +0000

    sys-apps/portage: Migrate to tmpfiles eclass
    
    The systemd_dotmpfilesd function is deprecated. Note that this
    effectively forces upgrade to >=sys-apps/openrc-0.23 for OpenRC users,
    since opentmpfiles blocks older versions of OpenRC (bug 643386).
    This should be acceptable, since the older OpenRC versions are nearly
    4 years old now.
    
    Closes: https://bugs.gentoo.org/740600
    Bug: https://bugs.gentoo.org/490676
    Bug: https://bugs.gentoo.org/643386
    Bug: https://bugs.gentoo.org/740638
    Package-Manager: Portage-3.0.5, Repoman-3.0.1
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/portage-2.3.103-r1.ebuild | 4 ++--
 sys-apps/portage/portage-2.3.99-r2.ebuild  | 4 ++--
 sys-apps/portage/portage-3.0.4-r1.ebuild   | 4 ++--
 sys-apps/portage/portage-3.0.5.ebuild      | 4 ++--
 sys-apps/portage/portage-9999.ebuild       | 4 ++--
 5 files changed, 10 insertions(+), 10 deletions(-)