Variables exported to the environment are limited to MAX_ARG_STRLEN of Linux, which is 128 KiB. In the case of A this can be a problem for ebuilds with many distfiles.
We should probably make a clear definitive list of which variables are exported and which are not. We could also kill A altogether, I suppose it's only needed as a convenience variable for default_src_unpack.
Let me expand on this. In my opinion, having A as 'defined but not exported' is against the principle of the least surprise. This is something people will simply miss or get wrong. Someone implementing a PM on top of this is most likely not to care much whether A is exported or not, and fall into the same trap. In the end, changing this doesn't really solve the problem. It merely shifts the claim from 'ebuild is making unreasonable assumptions' to 'package manager author did not read our fine print'. The end result is the same -- instead of making it easier to write compatible and reliable PMs, PMS just serves as an excuse to justify whatever implementation Portage achieved after working around hundred of issues. In my opinion, this isn't worth the effort. What we should do instead is set a tree policy preventing this issue from happening again. We can change PMS to indicate that exporting it is *optional* but saying it *must not* be exported is just asking people to stand head down.
(In reply to Michał Górny from comment #1) > We could also kill A altogether, I suppose it's > only needed as a convenience variable for default_src_unpack. If we specify that DISTDIR is supposed to contain exactly those files which should be unpacked, then that suffices, and allows us to avoid the need to specify an unexported variable.
(In reply to Zac Medico from comment #3) > (In reply to Michał Górny from comment #1) > > We could also kill A altogether, I suppose it's > > only needed as a convenience variable for default_src_unpack. > > If we specify that DISTDIR is supposed to contain exactly those files which > should be unpacked, then that suffices, and allows us to avoid the need to > specify an unexported variable. unpack ${A} is very common and very convenient. So I guess if the package manager won't define A any longer, then ebuilds would either start defining it themselves, or do horrible things like globbing inside ${DISTDIR}.
(In reply to Michał Górny from comment #2) Closing.
I move that we reconsider this bug, in the interest of long-term scalability. There's a working implementation in the patch for bug 720180, and having that, we need only add a new EAPI which does not export A.
Maybe we should have a tree policy limiting the number of distfiles an ebuild can have? If it's several hundred, then it should be repacked?
As discussed in #gentoo-pms, it is unlikely that we go for a special solution that just unexports A. Instead, we shall reconsider the whole list of variables defined in section 11.1: https://projects.gentoo.org/pms/8/pms.html#x1-10900011.1 One proposal would be to make exporting of all variables listed in that table optional, except HOME, TMPDIR, and possibly ROOT.
Ultimately, manifest bloat will become a show stopper here, which leads to bug 833567 as an alternative solution.
Env variables of single values: sure Env variables of large lists: let's avoid this entirely. As a general solution, instead of ${x}, what if we introduce ${x_ref} which is the name of a file containing all of the entries, one per line. Also provides clarity for the terrible upstreams that want to use spaces in files.
(In reply to Robin Johnson from comment #10) > Also provides clarity for the terrible upstreams that want to use spaces in > files. AFAICS we cannot have whitespace in filenames for multiple reasons: - SRC_URI is whitespace-separated - A space in an URI would have to be escaped by %20, and these files would be downloaded under there literal name - Manifest fields are separated by spaces, and Portage doesn't support the optional encoding from GLEP 74 (i.e. no \x20 or \u0020) Also not sure if we would want support for this. SRC_URI arrows exist, so files can be renamed to something sane.
I believe that the variables listed in table 11.1 can be categorised into four main groups: (1) "Directory type" variables (some of these, like ROOT or EPREFIX, aren't directories themselves, but they are combined with other components to form a directory path): DISTDIR, WORKDIR, S, ROOT, EROOT, SYSROOT, ESYSROOT, BROOT, T, TMPDIR, HOME, EPREFIX, D, ED (2) Variables describing the package name and version: P, PF, PN, CATEGORY, PV, PR, PVR (3) All others in EAPI 8: A, USE, EBUILD_PHASE, EBUILD_PHASE_FUNC, MERGE_TYPE, REPLACING_VERSIONS, REPLACED_BY_VERSION (4) Legacy variables that existed in previous EAPIs: PORTDIR, ECLASSDIR, DESTTREE, INSDESTTREE, AA, KV Some of the variables from (1) are used by external commands (e.g. TMPDIR, HOME, ROOT), others are used by ebuild helpers (e.g. D, ED, EPREFIX, T). Presumably we would want to stay consistent there and also follow the KISS principle, i.e. keep exporting all of them. From group (2) most notably PF is used by dodoc. PN and CATEGORY are used by keepdir. Not entirely sure what to do there, maybe the PM could export __PF as internal variable, or dodoc could get PF via IPC (or is that a crazy idea)? AFAICS there's no need to export any variable from (3).
(In reply to Ulrich Müller from comment #12) Alternatively, we could go for the all-out approach, and stop exporting everything except TMPDIR and HOME (not sure about ROOT). D, T, PF etc. would be retained by the PM as "conceptual variables" (we have that for DESTTREE and INSDESTTREE already). How the PM would pass them to helpers like dodoc would be left to the implementation. (For example, the PM could export a subset, or could write them to a file that would be sourced.)
Unfortunately, "leaving to the implementation" leaves us where we are now, i.e. you have to assume that they may be exported.
(In reply to Michał Górny from comment #14) > Unfortunately, "leaving to the implementation" leaves us where we are now, > i.e. you have to assume that they may be exported. By "implementation defined" I meant either exporting a subset under reserved names (like __E_PF etc.) or whatever other mechanism is suitable. Portage already does this for conceptual variables like DESTTREE (which is exported as _E_DESTTREE).
(In reply to Robin Johnson from comment #10) > As a general solution, instead of ${x}, what if we introduce ${x_ref} which > is the name of a file containing all of the entries, one per line. I think this is very sensible and a common pattern to use when a list of something (typically files) grows too large to be passed reliably command-line argument (or via the process' environment). For example, javac has @filename arguments, which point javac to a file containing a list of files. Ideally the title of this bug should read [Future EAPI] Don't export A, instead export ARCHIVES_FILE pointing to a file containing all entries of A Once we have ARCHIVES_FILE - I leave the name of the variable up to future bikeshedding - the default implementation of src_unpack() in a future EAPI could be simply __eapi9_src_unpack() { mapfile A "${ARCHIVES_FILE}" unpack "${A}" }
(In reply to Florian Schmaus from comment #16) > __eapi9_src_unpack() { > mapfile A "${ARCHIVES_FILE}" > unpack "${A}" > } The problem is that the unpack command may run into a related problem depending on ARG_MAX of the system, i.e. we cannot assume unlimited length of command line arguments.
(In reply to Florian Schmaus from comment #16) > unpack "${A}" Also, should be ${A}, not "${A}".
(In reply to Ulrich Müller from comment #17) > (In reply to Florian Schmaus from comment #16) > > __eapi9_src_unpack() { > > mapfile A "${ARCHIVES_FILE}" > > unpack "${A}" > > } > > The problem is that the unpack command may run into a related problem > depending on ARG_MAX of the system, i.e. we cannot assume unlimited length > of command line arguments. Right, however, this is only problematic if `unpack` is an external binary as opposed to a shell function (which it currently is). And it seems no like something we can not address. The obvious solutions include - enforcing that `unpack` is a shell function - define a new command (unpack_archives) or extend unpack to read the files to unpack from a file Actually, the latter, here shown re-use the '@' notation already used elsewhere for such cases, seems pretty nice: __eapi9_src_unpack() { unpack @"${ARCHIVES_FILE}" }
(In reply to Florian Schmaus from comment #19) > Right, however, this is only problematic if `unpack` is an external binary > as opposed to a shell function (which it currently is). It isn't: "Except where otherwise noted, they may be internal (shell functions or aliases) or external commands available in PATH; where this is not specified, ebuilds may not rely upon either behaviour." https://projects.gentoo.org/pms/8/pms.html#x1-12000012.3
(In reply to Ulrich Müller from comment #20) > (In reply to Florian Schmaus from comment #19) > > Right, however, this is only problematic if `unpack` is an external binary > > as opposed to a shell function (which it currently is). > > It isn't: "Except where otherwise noted, they may be internal (shell > functions or aliases) or external commands available in PATH; where this is > not specified, ebuilds may not rely upon either behaviour." > https://projects.gentoo.org/pms/8/pms.html#x1-12000012.3 I am sorry, I was not clear. Yes, PMS allows `unpack` to be anything that can be invoked via bash. What I meant is that `unpack` is currently implemented as shell function in portage (unless I am mistaken) and that one option to avoid the large process environment when spawning unpack would be to mandate by PMS that `unpack` has to be a shell function in future EAPIs. That said, that would not be my preferred way forward. An unpack-helper that takes a file, containing the archives to unpack, as argument seems more sensible.
Pull request, currently containing a proof-of-concept implementation, at https://github.com/gentoo/portage/pull/1357. As discussed in #-pms, since we touch A anyways, we could also consider converting it to a bash array. Looking at the current PoC code, it should not be that hard™.
See comment #8. Any implementation should consider _all_ variables defined in table 11.1 <https://projects.gentoo.org/pms/8/pms.html#x1-10900011.1>, plus possibly ECLASS, INHERITED and DEFINED_PHASES which are defined in section 7.4 <https://projects.gentoo.org/pms/8/pms.html#x1-700007.4>. Exceptions are HOME and TMPDIR.