Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 260118

Summary: [Future EAPI] "docompress" compression handling (prepalldocs replacement)
Product: Gentoo Hosted Projects Reporter: Ulrich Müller <ulm>
Component: PMS/EAPIAssignee: PMS/EAPI <pms>
Status: RESOLVED FIXED    
Severity: normal CC: alonbl, dev-portage, ferringb, pacho, pva
Priority: High    
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard: in-eapi-4
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 174380, 262365    
Attachments: Patch for pkg-mgr-commands.tex
Patch for pkg-mgr-commands.tex
Option -s, patch for pkg-mgr-commands.tex

Description Ulrich Müller gentoo-dev 2009-02-24 11:55:06 UTC
Following the long discussion in bug 250077 about prepalldocs, picking up ideas from alonbl and vapier in bug 164114, and taking some feedback by ciaranm, dleverton, ferringb and zmedico on -dev (see <http://archives.gentoo.org/gentoo-dev/msg_2fd5f58132881ef69219c126a525bce3.xml>) and #-portage into account, I'd like to propose the following extension for a future EAPI:

Package managers may optionally compress a subset of the files installed in ${D}, subject to the following rules:

The package manager (if supporting compression) internally maintains two lists of absolute paths, both having a default value, plus the possibility to modify them from ebuilds/eclasses:

   - an inclusion list, initially containing:
     /usr/share/doc /usr/share/info /usr/share/man
   - an exclusion list, initially containing:
     /usr/share/doc/${PF}/html

Ebuilds can use the command "docompress" to add items to the lists:

   - "docompress" (without option): add paths (directories or files)
     to the inclusion list
   - "docompress -x": add paths to the exclusion list

Note: Both operations may be combined in one command, i.e. "docompress <incl1> <incl2> ... -x <excl1> <excl2> ..."

Note: For package managers not supporting compression, "docompress" would be a no-op.

Between src_install and the phase immediately following it (pkg_preinst for current EAPIs), the package manager may compress all files and directories in the inclusion list, but it must exclude anything listed in the exclusion list.

Note: "Compressing a directory" means compressing all files in this directory and in its (any level) sub-directories.

Note: The package manager may apply additional restrictions based on users' settings (i.e. PORTAGE_COMPRESS and PORTAGE_COMPRESS_EXCLUDE_SUFFIXES in the Portage case).

The two most common usage cases:

   1. dodoc <file1> <file2> ...

and

   2. emake ... docdir="/usr/share/doc/${PF}" install

would just work as expected, without any need for calling docompress.

A package that needs uncompressed documentation would simply call "docompress -x /usr/share/doc" or "docompress -x /usr/share/doc/${PF}/<some subdir>".


Open questions: 
1. Do we need globbing or regexp support? I think not, IMHO we should keep
   things simple. (But may be needed for 2.)
2. Portage currently uses some heuristics to find "man" directories outside
   of /usr/share/man (basically it looks for anything named "man" with a
   first or second level subdir named "man*"). Should a similar functionality
   be supported, or is it acceptable that ebuilds requiring this would call
   docompress on the respective path?
3. Should commands like "dohtml" implicitly modify the inclusion and exclusion
   lists (makes the implementation less clean, but may be convenient)?
Comment 1 Brian Harring (RETIRED) gentoo-dev 2009-02-24 12:11:24 UTC
Note for #3, dohtml is a seperate process in most (if not all) implementations- meaning having it modify that list is nasty (requires file trickery).
Comment 2 Petteri Räty (RETIRED) gentoo-dev 2009-02-24 12:41:09 UTC
(In reply to comment #1)
> Note for #3, dohtml is a seperate process in most (if not all) implementations-
> meaning having it modify that list is nasty (requires file trickery).
> 

In order to be compatible with things like xargs those that are externals in Portage probably must be in other PMs too. 

(In reply to comment #0)
> 
> Open questions: 
> 1. Do we need globbing or regexp support? I think not, IMHO we should keep
>    things simple. (But may be needed for 2.)
>

People using docompress can just make use of the globbing and regexp support in bash IMHO.
Comment 3 Ulrich Müller gentoo-dev 2009-02-28 20:44:11 UTC
Thanks for the replies. So I assume that:
1. globbing or regexp support is not needed
2. no special "man" heuristics should be added (but no answers to this question)
3. lists shouldn't be modified by side-effects

So I think the next step is to write a patch for PMS. Where should the information be placed? Is a new section in chapter "The Ebuild Environment" appropriate (plus a short description of "docompress" under "Installation commands")?


Just as a remark, today I accidentally came across bug 176411 which shows that a more general compression control will be useful in other cases, too.
Comment 4 Ciaran McCreesh 2009-02-28 20:52:49 UTC
The next step's to wait until the call for things to go into EAPI 3 comes along. The process seems to be something like:

* People submit future EAPI bugs for things they like.
* Discussion on said bugs.
* Some of them get implemented in Portage, but not turned on.
* A proposal for "things in EAPI N" is sent to -dev@, and from there to the Council.
* Someone (it's mostly been me in the past, but others are welcome to help) comes up with formal wording for PMS, sticks it in a branch and gets Council approval for the merge to master.

So whilst a formal PMS-style description would be helpful for discussion and approval, we'd not include it in a branch of PMS until we're pretty certain we know exactly what EAPI 3 is going to be.
Comment 5 Ciaran McCreesh 2009-03-13 18:00:17 UTC
I'm working on an EAPI 3 "very rough draft that will probably have to have things taken out of it if they can't be done in time" locally. I've provisionally put in the following; comments welcome.

\subsubsection{Commands affecting install compression}
In EAPIs listed in table~\ref{tab:compression-table} as supporting controllable compression, the
package manager may optionally compress a subset of the files under the \t{D} directory. To control
which directories may or may not be compressed, the package manager shall maintain two lists:

\begin{compactitem}
\item An inclusion list, which initially contains \t{/usr/share/doc}, \t{/usr/share/info} and
    \t{/usr/share/man}.
\item An exclusion list, which initially contains \t{/usr/share/doc/\$\{PF\}/html}.
\end{compactitem}

The optional compression shall be carried out after \t{src\_install} has completed, and before the
execution of any subsequent phase function. For each item in the inclusion list, pretend it has the
value of the \t{D} variable prepended, then:

\begin{compactitem}
\item If it is a directory, act as if every file or directory immediately under this directory
    were in the inclusion list.
\item If the item is a file, it may be compressed unless it has been excluded as described below.
\item If the item does not exist, it is ignored.
\end{compactitem}

Whether an item is to be excluded is determined as follows: For each item in the exclusion list,
pretend it has the value of the \t{D} variable prepended, then:

\begin{compactitem}
\item If it is a directory, act as if every file or directory immediately under this directory
    were in the exclusion list.
\item If the item is a file, it shall not be compressed.
\item If the item does not exist, it is ignored.
\end{compactitem}

The package manager shall take appropriate steps to ensure that its compression mechanisms behave
sensibly even if an item is listed in the inclusion list multiple times, or if an item is a symlink.

The following commands may be used in \t{src\_install} to alter these lists. It is an error to call
any of these functions from any other phase.

\begin{description}
\item[docompress] If the first argument is \t{-x}, add each of its subsequent arguments to the
exclusion list. Otherwise, add each argument to the inclusion list. Only available in EAPIs listed
in table~\ref{tab:compression-table} as supporting \t{docompress}.
\end{description}

\begin{centertable}{EAPIs supporting controllable compression} \label{tab:compression-table}
\IFKDEBUILDELSE
{
    \begin{tabular}{ l l l }
        \toprule
            \multicolumn{1}{c}{\textbf{EAPI}} &
            \multicolumn{1}{c}{\textbf{Supports controllable compression?}} &
            \multicolumn{1}{c}{\textbf{Supports \t{docompress}?}} \\
            \midrule
    \t{0} & No & No \\
    \t{1} & No & No \\
    \t{kdebuild-1} & No & No \\
    \t{2} & No & No \\
    \t{3} & Yes & Yes \\
    \bottomrule
    \end{tabular}
}{
    \begin{tabular}{ l l l }
        \toprule
            \multicolumn{1}{c}{\textbf{EAPI}} &
            \multicolumn{1}{c}{\textbf{Supports controllable compression?}} &
            \multicolumn{1}{c}{\textbf{Supports \t{docompress}?}} \\
            \midrule
    \t{0} & No & No \\
    \t{1} & No & No \\
    \t{2} & No & No \\
    \t{3} & Yes & Yes \\
    \bottomrule
    \end{tabular}
}
\end{centertable}
Comment 6 Peter Volkov (RETIRED) gentoo-dev 2009-03-17 07:21:23 UTC
(In reply to comment #5)
> \item If the item does not exist, it is ignored.

Probably it's good idea to specify how docompress should behave on nonexistent directories. Should it die or issue warning?
Comment 7 Ulrich Müller gentoo-dev 2009-03-17 07:48:44 UTC
(In reply to comment #5)
> To control which directories may or may not be compressed, the package
> manager shall maintain two lists:

Does PMS obey RFC 2119 for the words "may" and "must"? If yes, then the above "may not" must ;-) be changed. Non-compression is mandatory if the ebuild requests it.

> \item If it is a directory, act as if every file or directory immediately
>       under this directory were in the inclusion list.

Why do you need the word "immediately" here?

> \item If it is a directory, act as if every file or directory immediately
>       under this directory were in the exclusion list.

Ditto.

> \item[docompress] If the first argument is \t{-x}, add each of its subsequent
> arguments to the exclusion list. Otherwise, add each argument to the
> inclusion list.

This differs from what I had proposed (namely to allow "docompress <incl> ... -x <excl> ..."), but I don't have a strong opinion here. So go ahead.


(In reply to comment #6)
> > \item If the item does not exist, it is ignored.
> 
> Probably it's good idea to specify how docompress should behave on
> nonexistent directories. Should it die or issue warning?

Good point. docompress could return a bad status if it meets a non-existent item (file or directory). I wouldn't die on it though, since the command just manipulates the lists, so in principle a directory may still be created after docompress was called.
Comment 8 Ulrich Müller gentoo-dev 2009-03-17 09:48:51 UTC
One thing is missing: The ebuild may want to know if compression is enabled, and which compression scheme is used by the package manager. Could you add the following sentence to the description of the "docompress" command? It should be trivial to implement, e.g. Portage would just return ${PORTAGE_COMPRESS}.

"If the first argument is \t{-s}, output the name of the compression program used, or an empty string if compression is not enabled."
Comment 9 Ciaran McCreesh 2009-03-17 13:55:48 UTC
(In reply to comment #6)
> Probably it's good idea to specify how docompress should behave on nonexistent
> directories. Should it die or issue warning?

Neither. Otherwise there'd be a horrible mess for the default list, that includes things that aren't necessarily there.

I don't see docompress as something that does compression. It's something that sets up a list that tells the package manager what it could compress in the future, so invalid inputs to it would be things that don't make sense, not things that don't exist. Perhaps 'cancompress' would be a better name...

(In reply to comment #7)
> Does PMS obey RFC 2119 for the words "may" and "must"?

No.

> > \item If it is a directory, act as if every file or directory immediately
> >       under this directory were in the inclusion list.
> 
> Why do you need the word "immediately" here?

It's easier to define the behaviour recursively than the data recursively.

> This differs from what I had proposed (namely to allow "docompress <incl> ...
> -x <excl> ..."), but I don't have a strong opinion here. So go ahead.

If we get into that, we also have to start caring about handling -- and so on. Currently ebuild utilities that take arguments for themselves mostly just take them as the first argument to avoid complications.

(In reply to comment #8)
> One thing is missing: The ebuild may want to know if compression is enabled,
> and which compression scheme is used by the package manager. Could you add the
> following sentence to the description of the "docompress" command? It should be
> trivial to implement, e.g. Portage would just return ${PORTAGE_COMPRESS}.
> 
> "If the first argument is \t{-s}, output the name of the compression program
> used, or an empty string if compression is not enabled."

I don't think we should be doing that. If we did, ebuilds would need to know how to handle every possible compression program the user might tell the package manager to use.

The point of docompress is that it's transparent to the ebuild.
Comment 10 Ulrich Müller gentoo-dev 2009-03-17 14:58:53 UTC
(In reply to comment #9)
> Perhaps 'cancompress' would be a better name...

No, that's the name of a PHP function (with a different meaning).


> > "If the first argument is \t{-s}, output the name of the compression
> > program used, or an empty string if compression is not enabled."

> I don't think we should be doing that. If we did, ebuilds would need to know
> how to handle every possible compression program the user might tell the
> package manager to use.

No, they would only need to know what compression schemes their package can handle, which should be a well-defined set.

I had thought of the case that a package works only with one particular compression scheme. For example, some programs (like sci-mathematics/maxima) can read their gzipped Info files, but not bzip2ed ones. In that case one could do something like:

   [[ $(docompress -s) == gzip ]] || docompress -x /usr/share/info

Too messy?
Comment 11 Ciaran McCreesh 2009-03-17 15:14:06 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Perhaps 'cancompress' would be a better name...
> 
> No, that's the name of a PHP function (with a different meaning).

ecancompress? ecompressable?

> > I don't think we should be doing that. If we did, ebuilds would need to know
> > how to handle every possible compression program the user might tell the
> > package manager to use.
> 
> No, they would only need to know what compression schemes their package can
> handle, which should be a well-defined set.

It's not a well defined set. It's a constantly moving target. For example, people would probably already have started using xz, and would have been using lzma previously, both of which are recent.

There's already a huge nasty EAPI dependent extension list for unpack... Don't really want to make it worse.

> I had thought of the case that a package works only with one particular
> compression scheme. For example, some programs (like sci-mathematics/maxima)
> can read their gzipped Info files, but not bzip2ed ones. In that case one could
> do something like:
> 
>    [[ $(docompress -s) == gzip ]] || docompress -x /usr/share/info
> 
> Too messy?

If you need a particular kind (or can't use a particular kind) of compression, you shouldn't be using docompress.
Comment 12 Ulrich Müller gentoo-dev 2009-03-17 15:59:10 UTC
> ecancompress? ecompressable?

Too long, for my taste (and shouldn't it be "ecompressible"?). And there's no other install function starting with "e".

How about "fcompress"? We have "fowners" and "fperms" already.
Comment 13 Ciaran McCreesh 2009-03-17 16:12:22 UTC
(In reply to comment #12)
> > ecancompress? ecompressable?
> 
> Too long, for my taste (and shouldn't it be "ecompressible"?). And there's no
> other install function starting with "e".
> 
> How about "fcompress"? We have "fowners" and "fperms" already.

The 'f' functions do things immediately (as do 'do' functions).

Googling 'cancompress' suggests that it's just some random PHP library using that name, so it doesn't really matter that there's a collision.
Comment 14 Ulrich Müller gentoo-dev 2009-03-22 10:54:03 UTC
Created attachment 185847 [details, diff]
Patch for pkg-mgr-commands.tex

Another point: Files could be already compressed and the package manager should handle this in a sensible way. See attached patch.
Comment 15 Ciaran McCreesh 2009-03-22 14:57:45 UTC
(In reply to comment #14)
> Created an attachment (id=185847) [edit]
> Patch for pkg-mgr-commands.tex

Could you git format-patch that please?

Comment 16 Ulrich Müller gentoo-dev 2009-03-22 16:05:21 UTC
Created attachment 185886 [details, diff]
Patch for pkg-mgr-commands.tex

> Could you git format-patch that please?

Voila.
Comment 17 Ciaran McCreesh 2009-03-26 00:17:16 UTC
Applied, thanks
Comment 18 Ulrich Müller gentoo-dev 2009-06-30 09:39:04 UTC
*** Bug 164114 has been marked as a duplicate of this bug. ***
Comment 19 Ulrich Müller gentoo-dev 2010-08-27 07:23:25 UTC
Two questions arose during implementation:

1. What should be done with arguments of docompress that are relative
   pathnames?
   a) Ignore (and emit a warning)?
   b) Interpret relative to ${D}?
   c) Interpret relative to ${INSDESTTREE}?

2. How should symlinks in the inclusion and exclusion lists be handled?
   So far, I expand them up to (and including) the last directory component
   of the path. Is this reasonable?
Comment 20 Zac Medico gentoo-dev 2010-08-27 21:01:05 UTC
(In reply to comment #19)
> 1. What should be done with arguments of docompress that are relative
>    pathnames?
>    a) Ignore (and emit a warning)?
>    b) Interpret relative to ${D}?
>    c) Interpret relative to ${INSDESTTREE}?

I'd go with the principle of least-surprise and make it relative to ./ since that's typically how command-line tools behave. Even if that behavior doesn't make much sense in this context, perhaps it's best to simply generate an error if the path can't be resolved.

> 2. How should symlinks in the inclusion and exclusion lists be handled?
>    So far, I expand them up to (and including) the last directory component
>    of the path. Is this reasonable?

Again, I think it's reasonable to generate an error if given invalid input. Even if the error can't be generated until after src_install, it's fine as long as we give an appropriate message so that the ebuild dev knows how to correct the problem.
Comment 21 Zac Medico gentoo-dev 2010-08-27 22:14:05 UTC
(In reply to comment #20)
> I'd go with the principle of least-surprise and make it relative to ./ since
> that's typically how command-line tools behave.

Maybe that's a bad idea given the way that some helpers use INSDESTTREE though...
Comment 22 Ulrich Müller gentoo-dev 2010-08-27 23:30:05 UTC
(In reply to comment #21)
> > I'd go with the principle of least-surprise and make it relative to ./
> > since that's typically how command-line tools behave.
> 
> Maybe that's a bad idea given the way that some helpers use INSDESTTREE
> though...

And some (most?) others like dodir, dosym, and *into use D. This is also what is prepended to absolute path arguments of docompress.

(In reply to comment #20)
> Again, I think it's reasonable to generate an error if given invalid input.
> Even if the error can't be generated until after src_install, it's fine as
> long as we give an appropriate message so that the ebuild dev knows how to
> correct the problem.

I could add some ewarn messages if a given path doesn't exist after the src_install phase. (There shouldn't be a warning for the default entries /usr/share/{doc,info,man} though.)
Comment 23 Ulrich Müller gentoo-dev 2010-08-28 15:57:29 UTC
(In reply to comment #19)
> 1. What should be done with arguments of docompress that are relative
>    pathnames?

It turns out that the spec already answers this question (but somehow I had missed it):
# For each item in the inclusion list, pretend it has the value of the \t{ED}
# variable in offset-prefix aware EAPIs or the \t{D} variable in offset-prefix
# agnostic EAPIs prepended,

So all pathnames should be interpreted relative to D or ED.

Sorry for the confusion.
Comment 24 Ulrich Müller gentoo-dev 2010-08-29 08:08:30 UTC
(In reply to comment #19)
> 2. How should symlinks in the inclusion and exclusion lists be handled?

Going for the "principle of least surprise" for that too, symlinks will be fully expanded.
Comment 25 Ulrich Müller gentoo-dev 2010-09-14 17:07:44 UTC
Some ebuilds are currently doing things like:

    suffix=$(ecompress --suffix)
    elog "See /usr/share/doc/${PF}/README${suffix} for documentation."

or use the suffix for other purposes. I think that it cannot harm to have a replacement for this functionality; ecompress shouldn't be called from an ebuild directly. I've discussed the matter with zmedico and suggest that the following sentence is added to the description of the docompress command:

"If the first argument is \t{-s}, output the suffix used for compressed files (e.\,g.\ \t{.bz2}); no further arguments are allowed in this case."

This is already in portage:
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=f2375609adc80ebe5395d84902af4045ecea2f73
Comment 26 Ulrich Müller gentoo-dev 2010-09-16 07:00:07 UTC
Created attachment 247545 [details, diff]
Option -s, patch for pkg-mgr-commands.tex

Here is the corresponding patch. I've reworded the whole paragraph, so that the behaviour of the command without an option is described first, then options -x and -s.
Comment 27 Ulrich Müller gentoo-dev 2010-09-20 07:09:45 UTC
(In reply to comment #25)
> "If the first argument is \t{-s}, output the suffix used for compressed files
> (e.\,g.\ \t{.bz2}); no further arguments are allowed in this case."

I withdraw this.

Considering that there are user-configurable exceptions to compression like PORTAGE_COMPRESS_EXCLUDE_SUFFIXES, I come to the conclusion that such an option could not work in a reliable way.
Comment 28 Ulrich Müller gentoo-dev 2010-12-30 19:53:12 UTC
EAPI 4 has been approved by the council.