Variables exported to the environment are limited to MAX_ARG_STRLEN of Linux, which is 128 KiB. In the case of A this can be a problem for ebuilds with many distfiles.
We should probably make a clear definitive list of which variables are exported and which are not. We could also kill A altogether, I suppose it's only needed as a convenience variable for default_src_unpack.
Let me expand on this. In my opinion, having A as 'defined but not exported' is against the principle of the least surprise. This is something people will simply miss or get wrong. Someone implementing a PM on top of this is most likely not to care much whether A is exported or not, and fall into the same trap. In the end, changing this doesn't really solve the problem. It merely shifts the claim from 'ebuild is making unreasonable assumptions' to 'package manager author did not read our fine print'. The end result is the same -- instead of making it easier to write compatible and reliable PMs, PMS just serves as an excuse to justify whatever implementation Portage achieved after working around hundred of issues. In my opinion, this isn't worth the effort. What we should do instead is set a tree policy preventing this issue from happening again. We can change PMS to indicate that exporting it is *optional* but saying it *must not* be exported is just asking people to stand head down.
(In reply to Michał Górny from comment #1) > We could also kill A altogether, I suppose it's > only needed as a convenience variable for default_src_unpack. If we specify that DISTDIR is supposed to contain exactly those files which should be unpacked, then that suffices, and allows us to avoid the need to specify an unexported variable.
(In reply to Zac Medico from comment #3) > (In reply to Michał Górny from comment #1) > > We could also kill A altogether, I suppose it's > > only needed as a convenience variable for default_src_unpack. > > If we specify that DISTDIR is supposed to contain exactly those files which > should be unpacked, then that suffices, and allows us to avoid the need to > specify an unexported variable. unpack ${A} is very common and very convenient. So I guess if the package manager won't define A any longer, then ebuilds would either start defining it themselves, or do horrible things like globbing inside ${DISTDIR}.
(In reply to Michał Górny from comment #2) Closing.
I move that we reconsider this bug, in the interest of long-term scalability. There's a working implementation in the patch for bug 720180, and having that, we need only add a new EAPI which does not export A.
Maybe we should have a tree policy limiting the number of distfiles an ebuild can have? If it's several hundred, then it should be repacked?
As discussed in #gentoo-pms, it is unlikely that we go for a special solution that just unexports A. Instead, we shall reconsider the whole list of variables defined in section 11.1: https://projects.gentoo.org/pms/8/pms.html#x1-10900011.1 One proposal would be to make exporting of all variables listed in that table optional, except HOME, TMPDIR, and possibly ROOT.
Ultimately, manifest bloat will become a show stopper here, which leads to bug 833567 as an alternative solution.
Env variables of single values: sure Env variables of large lists: let's avoid this entirely. As a general solution, instead of ${x}, what if we introduce ${x_ref} which is the name of a file containing all of the entries, one per line. Also provides clarity for the terrible upstreams that want to use spaces in files.
(In reply to Robin Johnson from comment #10) > Also provides clarity for the terrible upstreams that want to use spaces in > files. AFAICS we cannot have whitespace in filenames for multiple reasons: - SRC_URI is whitespace-separated - A space in an URI would have to be escaped by %20, and these files would be downloaded under there literal name - Manifest fields are separated by spaces, and Portage doesn't support the optional encoding from GLEP 74 (i.e. no \x20 or \u0020) Also not sure if we would want support for this. SRC_URI arrows exist, so files can be renamed to something sane.
I believe that the variables listed in table 11.1 can be categorised into four main groups: (1) "Directory type" variables (some of these, like ROOT or EPREFIX, aren't directories themselves, but they are combined with other components to form a directory path): DISTDIR, WORKDIR, S, ROOT, EROOT, SYSROOT, ESYSROOT, BROOT, T, TMPDIR, HOME, EPREFIX, D, ED (2) Variables describing the package name and version: P, PF, PN, CATEGORY, PV, PR, PVR (3) All others in EAPI 8: A, USE, EBUILD_PHASE, EBUILD_PHASE_FUNC, MERGE_TYPE, REPLACING_VERSIONS, REPLACED_BY_VERSION (4) Legacy variables that existed in previous EAPIs: PORTDIR, ECLASSDIR, DESTTREE, INSDESTTREE, AA, KV Some of the variables from (1) are used by external commands (e.g. TMPDIR, HOME, ROOT), others are used by ebuild helpers (e.g. D, ED, EPREFIX, T). Presumably we would want to stay consistent there and also follow the KISS principle, i.e. keep exporting all of them. From group (2) most notably PF is used by dodoc. PN and CATEGORY are used by keepdir. Not entirely sure what to do there, maybe the PM could export __PF as internal variable, or dodoc could get PF via IPC (or is that a crazy idea)? AFAICS there's no need to export any variable from (3).
(In reply to Ulrich Müller from comment #12) Alternatively, we could go for the all-out approach, and stop exporting everything except TMPDIR and HOME (not sure about ROOT). D, T, PF etc. would be retained by the PM as "conceptual variables" (we have that for DESTTREE and INSDESTTREE already). How the PM would pass them to helpers like dodoc would be left to the implementation. (For example, the PM could export a subset, or could write them to a file that would be sourced.)
Unfortunately, "leaving to the implementation" leaves us where we are now, i.e. you have to assume that they may be exported.
(In reply to Michał Górny from comment #14) > Unfortunately, "leaving to the implementation" leaves us where we are now, > i.e. you have to assume that they may be exported. By "implementation defined" I meant either exporting a subset under reserved names (like __E_PF etc.) or whatever other mechanism is suitable. Portage already does this for conceptual variables like DESTTREE (which is exported as _E_DESTTREE).
(In reply to Robin Johnson from comment #10) > As a general solution, instead of ${x}, what if we introduce ${x_ref} which > is the name of a file containing all of the entries, one per line. I think this is very sensible and a common pattern to use when a list of something (typically files) grows too large to be passed reliably command-line argument (or via the process' environment). For example, javac has @filename arguments, which point javac to a file containing a list of files. Ideally the title of this bug should read [Future EAPI] Don't export A, instead export ARCHIVES_FILE pointing to a file containing all entries of A Once we have ARCHIVES_FILE - I leave the name of the variable up to future bikeshedding - the default implementation of src_unpack() in a future EAPI could be simply __eapi9_src_unpack() { mapfile A "${ARCHIVES_FILE}" unpack "${A}" }
(In reply to Florian Schmaus from comment #16) > __eapi9_src_unpack() { > mapfile A "${ARCHIVES_FILE}" > unpack "${A}" > } The problem is that the unpack command may run into a related problem depending on ARG_MAX of the system, i.e. we cannot assume unlimited length of command line arguments.
(In reply to Florian Schmaus from comment #16) > unpack "${A}" Also, should be ${A}, not "${A}".
(In reply to Ulrich Müller from comment #17) > (In reply to Florian Schmaus from comment #16) > > __eapi9_src_unpack() { > > mapfile A "${ARCHIVES_FILE}" > > unpack "${A}" > > } > > The problem is that the unpack command may run into a related problem > depending on ARG_MAX of the system, i.e. we cannot assume unlimited length > of command line arguments. Right, however, this is only problematic if `unpack` is an external binary as opposed to a shell function (which it currently is). And it seems no like something we can not address. The obvious solutions include - enforcing that `unpack` is a shell function - define a new command (unpack_archives) or extend unpack to read the files to unpack from a file Actually, the latter, here shown re-use the '@' notation already used elsewhere for such cases, seems pretty nice: __eapi9_src_unpack() { unpack @"${ARCHIVES_FILE}" }
(In reply to Florian Schmaus from comment #19) > Right, however, this is only problematic if `unpack` is an external binary > as opposed to a shell function (which it currently is). It isn't: "Except where otherwise noted, they may be internal (shell functions or aliases) or external commands available in PATH; where this is not specified, ebuilds may not rely upon either behaviour." https://projects.gentoo.org/pms/8/pms.html#x1-12000012.3
(In reply to Ulrich Müller from comment #20) > (In reply to Florian Schmaus from comment #19) > > Right, however, this is only problematic if `unpack` is an external binary > > as opposed to a shell function (which it currently is). > > It isn't: "Except where otherwise noted, they may be internal (shell > functions or aliases) or external commands available in PATH; where this is > not specified, ebuilds may not rely upon either behaviour." > https://projects.gentoo.org/pms/8/pms.html#x1-12000012.3 I am sorry, I was not clear. Yes, PMS allows `unpack` to be anything that can be invoked via bash. What I meant is that `unpack` is currently implemented as shell function in portage (unless I am mistaken) and that one option to avoid the large process environment when spawning unpack would be to mandate by PMS that `unpack` has to be a shell function in future EAPIs. That said, that would not be my preferred way forward. An unpack-helper that takes a file, containing the archives to unpack, as argument seems more sensible.
Pull request, currently containing a proof-of-concept implementation, at https://github.com/gentoo/portage/pull/1357. As discussed in #-pms, since we touch A anyways, we could also consider converting it to a bash array. Looking at the current PoC code, it should not be that hard™.
See comment #8. Any implementation should consider _all_ variables defined in table 11.1 <https://projects.gentoo.org/pms/8/pms.html#x1-10900011.1>, plus possibly ECLASS, INHERITED and DEFINED_PHASES which are defined in section 7.4 <https://projects.gentoo.org/pms/8/pms.html#x1-700007.4>. Exceptions are HOME and TMPDIR.
(In reply to Ulrich Müller from comment #23) > See comment #8. Any implementation should consider _all_ variables defined > in table 11.1 <https://projects.gentoo.org/pms/8/pms.html#x1-10900011.1>, > plus possibly ECLASS, INHERITED and DEFINED_PHASES which are defined in > section 7.4 <https://projects.gentoo.org/pms/8/pms.html#x1-700007.4>. > > Exceptions are HOME and TMPDIR. I've looked into this, and it appears to be a rabbit hole. Many of portage's helpers make assumption that certain PMS variables are involved. As of now, I've identified: - T (used by newins) - ED (used by doins) a quick grep in portage/bin shows that PF, EPREFIX are likely also used. And this feels just like the tip of the iceberg. It seems *-qa-check also uses these variables, however I haven't looked yet if the QA checks are sourced or forked. I currently have # The following is a set of PMS § 11.1 and § 7.4 without # - TMPDIR # - HOME # because these variables are often assumed to be exported and # therefore consumed by child processes, without # - T # which is required to be exported e.g., for portage's newins # without, # - ED # which is required for doins _unexported_pms_vars = frozenset( [ # PMS § 11.1 Defined Variables "P", "PF", "PN", "CATEGORY", "PV", "PR", "PVR", "A", "AA", "FILESDIR", "DISTDIR", "WORKDIR", "S", "PORTDIR", "ECLASSDIR", "ROOT", "EROOT", "SYSROOT", "ESYSROOT", "BROOT", # "T", # "TMPDIR", # "HOME", "EPREFIX", "D", # "ED", "DESTTREE", "INSDESTTREE", "EBUILD_PHASE", "EBUILD_PHASE_FUNC", "KV", "MERGE_TYPE", "REPLACING_VERSIONS", "REPLACED_BY_VERSION", # PMS 7.4 Magic Ebuild-defined Variables "ECLASS", "INHERITED", "DEFINED_PHASES", ] ) in a local branch of portage where all these PMS variables are not exported. Looking at these variables, it seems that only A and AA are variables that may accumulate a large list of entries. And AA is not longer used since EAPI 4, so it is just A. With this in mind, I would favor https://github.com/gentoo/portage/commit/9f93e30eb9b716f7e4adbb71d11a57911872201c (from https://github.com/gentoo/portage/pull/1357), which only unexports A and has left the proof of concept stage.
(In reply to Florian Schmaus from comment #24) > > in a local branch of portage where all these PMS variables are not exported. > Looking at these variables, it seems that only A and AA are variables that > may accumulate a large list of entries. It's not just about that. It's: a) the principle; b) the fact that these variables sometimes break Makefiles (especially P, sometimes D). Of course, difficulty of implementation is a valid reason to leave it for now, even if not ideal.
(In reply to Sam James from comment #25) > (In reply to Florian Schmaus from comment #24) > > > > in a local branch of portage where all these PMS variables are not exported. > > Looking at these variables, it seems that only A and AA are variables that > > may accumulate a large list of entries. > > It's not just about that. It's: a) the principle; b) the fact that these > variables sometimes break Makefiles (especially P, sometimes D). Fair point. I've created PR with the second variant of the patch, where more than just A is unexported, at https://github.com/gentoo/portage/pull/1407. As you see, we could easily include P and D there. P seems to be unused in portage/bin and D only used if the EAPI has no prefix variables (so not relevant for EAPI 9). Seems like we are able to unexport P and D. \o/ Feel free to comment on the PR about further variables that should be unexported.
> Fair point. I've created PR with the second variant of the patch, where more > than just A is unexported, at https://github.com/gentoo/portage/pull/1407. Maybe I am missing something, but I believe that this would break helpers (e.g. dodoc). Also, I don't entirely understand what your criteria for exporting are. I had outlined some grouping in comment #12, and had intended this as a starting point for further discussion. IMHO it makes little sense to restart from scratch again.
(In reply to Sam James from comment #25) > (In reply to Florian Schmaus from comment #24) > > > > in a local branch of portage where all these PMS variables are not exported. > > Looking at these variables, it seems that only A and AA are variables that > > may accumulate a large list of entries. > > It's not just about that. It's: a) the principle; b) the fact that these > variables sometimes break Makefiles (especially P, sometimes D). > > Of course, difficulty of implementation is a valid reason to leave it for > now, even if not ideal. Regarding discussion about purity, as it is "more clean" to not export anything even if it is "needed by helpers", I would say that helpers should source the private environment file (proposed to be located via ${PORTAGE_EBUILD_EXTRA_SOURCE} in the PR). This could be a general thing for "conceptual variables", even the ones currently guarded as __E_* Thereby addressing point a) So we could go back down to HOME and TMPDIR (both defined by POSIX, likely exist regardless, need to align with what portage wants them to be / need to have the effect of changing what buildsystem commands see as "the system temporary directory" etc) plus a single variable that says where to get further information from
(In reply to Florian Schmaus from comment #24) > # - T > # which is required to be exported e.g., for portage's newins > # without, > # - ED > # which is required for doins I think it'd be fine to for the first (or second) step to just rename them, say to __PORTAGE_T and __PORTAGE_ED.
It seems we are perusing https://github.com/gentoo/portage/pull/1407/ (which does not export any PMS variables besides TMPDIR and HOME) over https://github.com/gentoo/portage/pull/1357 (which only does not export A).
First draft posted to gentoo-pms: https://public-inbox.gentoo.org/gentoo-pms/20250108084419.12415-1-ulm@gentoo.org/T/#u
This is a good proposal, and besides what has already been stated here, there are some cases when exported PMS variables led to obscure test failures.
The bug has been closed via the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=c75b61b523457f8798c66afe4bcdc1708408ea00 commit c75b61b523457f8798c66afe4bcdc1708408ea00 Author: Florian Schmaus <flow@gentoo.org> AuthorDate: 2024-04-02 20:28:30 +0000 Commit: Mike Gilbert <floppym@gentoo.org> CommitDate: 2025-02-01 17:13:46 +0000 Do not export PMS variables in EAPI 9 (or if FEATURES=-export-pms-vars) Instead of passing PMS variables as part of the process environment, we pass them via a file that is later sourced by the ebuild's bash. Since, for example, A is usually the greatest contributor to the process environment, removing it from the process environment significantly avoids running into MAX_ARG_STRLEN when spawning a new child process. This means that A and other PMS variables are no longer exported in the ebuild and hence unavaiable to child processes. However, A is mostly used as part of the default_src_unpack function and there A does not need to be exported. This started as a change that only unexported A, but was later extended to most PMS variables as suggested by ulm in https://bugs.gentoo.org/721088#c23. Thanks to Zac Medico for helpful input on this change, and to Eli Schwartz for suggesting that (bash) helpers should simply source the environment file introduced by this change. Closes: https://bugs.gentoo.org/721088 Signed-off-by: Florian Schmaus <flow@gentoo.org> Closes: https://github.com/gentoo/portage/pull/1407 Signed-off-by: Mike Gilbert <floppym@gentoo.org> bin/isolated-functions.sh | 9 +++ bin/phase-functions.sh | 13 ++++ cnf/make.globals | 2 +- lib/_emerge/EbuildMetadataPhase.py | 7 ++ lib/portage/const.py | 1 + lib/portage/eapi.py | 8 ++ lib/portage/package/ebuild/doebuild.py | 132 ++++++++++++++++++++++++++++++++- 7 files changed, 170 insertions(+), 2 deletions(-)