Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 644366

Summary: [Future EAPI] automatic keepdir for empty directories
Product: Gentoo Hosted Projects Reporter: Michael Orlitzky <mjo>
Component: PMS/EAPIAssignee: PMS/EAPI <pms>
Status: RESOLVED NEEDINFO    
Severity: normal CC: esigra, mgorny, tsmksubc
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 174380    

Description Michael Orlitzky gentoo-dev 2018-01-12 20:35:50 UTC
The PMS states that empty directories have undefined behavior:

  https://projects.gentoo.org/pms/6/pms.html#x1-15400012.2.2

To avoid that, we have two directory creation functions, dodir and keepdir:

  https://projects.gentoo.org/pms/6/pms.html#x1-14000011.3.3.9

There are a few problems with requiring developers to know whether or not a directory is going to be empty, and to use the correct function.

  1. Aesthetically, it's just ugly to have two functions that do (from
     the user's perspective) the same thing.

  2. Many empty directories are created by the package's build system.
     At best, this results in an install phase that looks like,

       emake DESTDIR="${D}" install
       keepdir dir1
       ...
       keepdir dir12

     But more commonly, the keepdirs are omitted, because the developer
     doesn't know what the build system is doing.

  3. Using dodir instead of keepdir works in Portage, so the problem
     is buried in the PMS, and many packages now rely on the undefined
     behavior.

The second and third items make this difficult to address through (say) a QA warning. I propose that we make the package manager smart enough to handle empty directories by performing the keepdir itself.

In short, in a new EAPI, we would,

  a. Ban keepdir.

  b. Update the PMS: between src_install and pkg_preinst (or perhaps after
     pkg_preinst but before merging), the PM itself should call its keepdir
     function on every empty directory in $ED.

  c. Update the devmanual and package manager documentation to suggest
     dodir (or nothing at all, if the empty directory is created by the
     package's build system) wherever keepdir is used in the new EAPI.

And optionally:

  d. Make it an error to merge an empty directory. In other words, require
     any empty directories to be created in pkg_postinst().

If it is desirable to specify the command that locates empty directories, then the following should work:

  find "${ED}" -mindepth 1 -type d -empty

Some discussion about this took place on the -dev list:

https://archives.gentoo.org/gentoo-dev/message/a20ce7fc43dbac738fa412ba7c07db1f

And there is a QA check proposed to alert people of the problem:

https://archives.gentoo.org/gentoo-portage-dev/message/421fa08163401d0464d8349efc84dfdf
Comment 1 Ulrich Müller gentoo-dev 2018-01-12 21:17:03 UTC
(In reply to Michael Orlitzky from comment #0)
>   b. Update the PMS: between src_install and pkg_preinst (or perhaps after
>      pkg_preinst but before merging), the PM itself should call its keepdir
>      function on every empty directory in $ED.

Directories that are legitimately empty seem to be the exception, not the norm. So I'd rather suggest that the PM should remove any empty directory that it meets at that point.

The problem is also that after src_install there is no way for the PM to reliably determine if the directory will be empty (think of binpkgs).
Comment 2 Michael Orlitzky gentoo-dev 2018-01-12 21:40:43 UTC
(In reply to Ulrich Müller from comment #1)
> (In reply to Michael Orlitzky from comment #0)
> >   b. Update the PMS: between src_install and pkg_preinst (or perhaps after
> >      pkg_preinst but before merging), the PM itself should call its keepdir
> >      function on every empty directory in $ED.
> 
> Directories that are legitimately empty seem to be the exception, not the
> norm. So I'd rather suggest that the PM should remove any empty directory
> that it meets at that point.

Since this is a new EAPI, you could do that without breaking any existing packages, but it would still result in a lot of code like

  src_install() {
    default
    
    # These are all created by the build system, but the package manager
    # deletes them since they're empty at first.
    /var/spool/postfix/active
    /var/spool/postfix/flush
    /var/spool/postfix/deferred
    /var/spool/postfix/saved
    /var/spool/postfix/trace
    /var/spool/postfix/hold
    /var/spool/postfix/maildrop
    /var/spool/postfix/corrupt
    /var/spool/postfix/incoming
    /var/spool/postfix/defer
    /var/spool/postfix/bounce
  }

A lot of packages do something similar. The most popular are probably /var/log/${PN} and /var/lib/${PN}.
Comment 3 Michael Orlitzky gentoo-dev 2018-01-12 21:42:00 UTC
And on the topic of omitting keepdir, I forgot to copy/paste "keepdir" before every one of those directories above.
Comment 4 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-01-12 22:32:29 UTC
How are you planning to deal with file collisions between multiple packages installing the same empty directory?
Comment 5 Michael Orlitzky gentoo-dev 2018-01-12 22:38:06 UTC
(In reply to Michał Górny from comment #4)
> How are you planning to deal with file collisions between multiple packages
> installing the same empty directory?

I don't see it as a special case. Presumably they would block one another.

It's not perfect, but consider that the alternatives on the table involve having both packages call "keepdir" manually in src_install, leading to exactly the same problem.
Comment 6 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-01-12 23:44:35 UTC
(In reply to Michael Orlitzky from comment #5)
> (In reply to Michał Górny from comment #4)
> > How are you planning to deal with file collisions between multiple packages
> > installing the same empty directory?
> 
> I don't see it as a special case. Presumably they would block one another.

Imagine a build system calls 'install -d' unconditionally, then installs a file conditionally. This way, you can easily get a lot of colliding empty directories like /usr/share/locale and so on. Sounds like you want people to manually strip those directories a lot.

> It's not perfect, but consider that the alternatives on the table involve
> having both packages call "keepdir" manually in src_install, leading to
> exactly the same problem.

Actually, keepdir is collision-safe. I'm not sure if PMS guarantees it but then, PMS doesn't describe collision-protect at all, so it could be considered a combined part of the implementation.
Comment 7 Michael Orlitzky gentoo-dev 2018-01-12 23:57:46 UTC
(In reply to Michał Górny from comment #6)
> > > How are you planning to deal with file collisions between multiple packages
> > > installing the same empty directory?
> 
> Imagine a build system calls 'install -d' unconditionally, then installs a
> file conditionally. This way, you can easily get a lot of colliding empty
> directories like /usr/share/locale and so on. Sounds like you want people to
> manually strip those directories a lot.

Hmmm... the Right Thing To Do would be to patch the build system when you do the EAPI bump, and to send the patch upstream. Then only if the patch is not accepted would you hack around the empty directory in the ebuild (delete it if it's empty at the end of src_install).

I have no idea how prevalent this is, so it's a good point.


> > It's not perfect, but consider that the alternatives on the table involve
> > having both packages call "keepdir" manually in src_install, leading to
> > exactly the same problem.
> 
> Actually, keepdir is collision-safe. I'm not sure if PMS guarantees it but
> then, PMS doesn't describe collision-protect at all, so it could be
> considered a combined part of the implementation.

Indeed, I chose "the PM itself should call its keepdir function" to ensure that the auto-keepdir is no worse than a (comprehensive) manual-keepdir would be. If we want to make the auto-keepdir collision-safe, we could just update the PMS to say that keepdir itself is collision-safe in the new EAPI. Manually invoking keepdir will be banned anyway in the new EAPI, so it should only affect the auto-keepdir behavior.
Comment 8 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-01-13 09:29:15 UTC
You still didn't clarify how it is supposed to do that. Because if that means using the current method, then we're gonna end up with a lot of useless .keep* files all around the place just because some package happened to have a meaningless empty directory.

So, instead of pursuing keepdir for a few valid empty directories (which could be logical to some point), we're end up pursuing a lot of unnecessary empty directories marked for preservation which I imagine as much more work and totally reverse to logic.
Comment 9 Ciaran McCreesh 2018-01-13 12:05:52 UTC
It would not be hard to write an autokeepdir eclass, which developers could try using and we could see how much mess it makes...
Comment 10 Michael Orlitzky gentoo-dev 2018-01-13 17:39:27 UTC
(In reply to Michał Górny from comment #8)
> 
> So, instead of pursuing keepdir for a few valid empty directories (which
> could be logical to some point), we're end up pursuing a lot of unnecessary
> empty directories marked for preservation which I imagine as much more work
> and totally reverse to logic.

I understand that, but the trade-off looks good to me.

To summarize: we have two equally valid proposals, either of which works in a new EAPI. The first is to automatically call keepdir on empty directories, and the second is to delete them.

If we automatically call keepdir, then we will wind up with some unnecessarily-kept directories, like your example /usr/share/locale. And if we delete empty directories, then we force developers (in some cases) to repeat what build system has already done, and call keepdir on legitimate empty directories created by the build system.

The reason I prefer the extra ".keep" files is because they're not permanent. If the PM deletes my empty directories, then I'm stuck with

  src_install() {
    ...
    keepdir /var/nagios/rw
    keepdir /var/nagios/spool/checkresults
    keepdir /var/nagios/archives
  }

for the rest of my life.

On the other hand, if my package creates an empty /usr/share/locale and then doesn't use it, there is at least the possibility that I can fix the build system to not do that. Neither problem really hurts anything, but one leads to tiny improvements in upstream build systems and can eventually be minimized.

Which option is more practical probably depends on the number of packages that create "legitimate" empty directories versus the number that create illegitimate ones. My spider sense says that daemons tend to create legitimate ones, while desktop packages probably create empty ones by accident more often.

I think legitimate empty directories are more prevalent than you think, and I think you think illegitimate ones are more prevalent than I think, and maybe we're both right, who knows.
Comment 11 Ulrich Müller gentoo-dev 2018-01-13 20:43:35 UTC
(In reply to Michael Orlitzky from comment #10)
> To summarize: we have two equally valid proposals, either of which works in
> a new EAPI. The first is to automatically call keepdir on empty directories,
> and the second is to delete them.

The second proposal provides a well-defined interface giving the ebuild maintainer full control. Whereas the first uses a heuristic method trying to determine which of the empty directories are legitimate.

> If we automatically call keepdir, then we will wind up with some
> unnecessarily-kept directories, like your example /usr/share/locale.

Potentially, each one with many .keep* files in it.

> And if we delete empty directories, then we force developers (in some cases)
> to repeat what build system has already done, and call keepdir on legitimate
> empty directories created by the build system.

This is was we have already now. PMS says that behaviour for empty directories is undefined, so ebuilds cannot not rely on anything there.

> The reason I prefer the extra ".keep" files is because they're not
> permanent. If the PM deletes my empty directories, then I'm stuck with
> 
>   src_install() {
>     ...
>     keepdir /var/nagios/rw
>     keepdir /var/nagios/spool/checkresults
>     keepdir /var/nagios/archives
>   }
> 
> for the rest of my life.

And what is wrong with these commands?
Comment 12 Michael Orlitzky gentoo-dev 2018-01-13 21:22:25 UTC
(In reply to Ulrich Müller from comment #11)
> (In reply to Michael Orlitzky from comment #10)
> > To summarize: we have two equally valid proposals, either of which works in
> > a new EAPI. The first is to automatically call keepdir on empty directories,
> > and the second is to delete them.
> 
> The second proposal provides a well-defined interface giving the ebuild
> maintainer full control. Whereas the first uses a heuristic method trying to
> determine which of the empty directories are legitimate.

We already have an interface providing the maintainer full control: if you don't want empty directories, don't create them. Just like anything else the ebuild creates. The command `rm -r` says "I don't want this directory" in exactly the same way that "keepdir" says "I want this directory."

We have a choice between enumerating the legitimate ones and enumerating the superfluous ones. Enumerating the superfluous ones is more consistent (how do we handle superfluous files?), and has less of a penalty if you decide not to do it for whatever reason.

 
> This is was we have already now. PMS says that behaviour for empty
> directories is undefined, so ebuilds cannot not rely on anything there.

True enough, but we can't actually enforce it without breaking a bazillion existing packages. In a new EAPI, we could enforce it, but we also get to re-think whether or not the status quo is the best design.


> >   src_install() {
> >     ...
> >     keepdir /var/nagios/rw
> >     keepdir /var/nagios/spool/checkresults
> >     keepdir /var/nagios/archives
> >   }
> > 
> > for the rest of my life.
> 
> And what is wrong with these commands?

To put it bluntly, it's just stupid that I have to say, "yes, I really want these directories, that I already told you I wanted, since I created them in the first place."

In some cases, people will be forced to abandon the default src_install, and instead use

  src_install() {
    default

    # The default src_install creates these, but they're empty so the
    # PM deletes them.
    keepdir /var/log/foo
    keepdir /var/lib/foo
  }

It's not the end of the world, but that approach strikes me as more ugly than the other way around:

  src_install() {
    default
    
    # I really hate .keep files
    rm -r "${ED}/usr/share/locale"
  }

In the latter example, you could fix the upstream build system (best option), or simply tolerate the additional ".keep" file. "Tolerate it" is an option you don't have when your log directory is deleted.
Comment 13 Ciaran McCreesh 2018-01-13 22:13:18 UTC
>  To put it bluntly, it's just stupid that I have to say, "yes, I really want these directories, that I already told you I wanted, since I created them in the first place."

How does the package mangler know that you created them, and that they weren't made by a crappy makefile?
Comment 14 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2018-01-13 22:20:32 UTC
And I find magically creating extra files in empty directories (yet not doing anything for non-empty directories!) awfully ugly and surprising by design. It's basically 'wtf is this file doing here? I didn't ask for it! And why it isn't there? Phase of the moon or something?'

if you really want to pursue that, then please get some stats on how many empty directories are installed by ebuilds currently (I think you can parse CONTENTS from vdb for that), with some numbers that include:

a. how many empty directories are installed entirely unnecessarily,

b. how many empty directories should be keepdir'd (+ how many are),

c. how many empty directories need to be kept but are always 'filled in' by some other package (dependency) and therefore don't need keepfiles.
Comment 15 Michael Orlitzky gentoo-dev 2018-01-14 02:31:04 UTC
(In reply to Ciaran McCreesh from comment #13)
> 
> How does the package mangler know that you created them, and that they
> weren't made by a crappy makefile?

How does the package manager know that *anything* in ${D} should be there?

We already count on developers to delete every other type of garbage (like LICENSE files) from ${D} when "make install" puts them there. This change would place empty directories on the same footing, rather than having them be the one class of object whose existence needs to be reaffirmed by the maintainer.


(In reply to Michał Górny from comment #14)
> And I find magically creating extra files in empty directories (yet not
> doing anything for non-empty directories!) awfully ugly and surprising by
> design. It's basically 'wtf is this file doing here? I didn't ask for it!
> And why it isn't there? Phase of the moon or something?'

Says the guy who wants to delete things that I did specifically install =P

I think the ".keep" files are ugly, but they're the lesser of two evils in this case. I'm not going to compile a mountain of statistics... it seems like most of you are against this anyway, and my preference for auto-keepdir is more aesthetic than anything else. A low-effort way to get some stats would be with your QA warning. If no one complains, then you were right.
Comment 16 Ulrich Müller gentoo-dev 2018-03-15 11:18:13 UTC
(In reply to Michael Orlitzky from comment #15)
> I think the ".keep" files are ugly, but they're the lesser of two evils in
> this case. I'm not going to compile a mountain of statistics...

Shall we interpret this as lack of interest? Closing.

> it seems like most of you are against this anyway, and my preference for
> auto-keepdir is more aesthetic than anything else. A low-effort way to get
> some stats would be with your QA warning. If no one complains, then you were
> right.