Fix bugs in __helpers_die() and nonfatal Amongst other issues, this resolves what has been determined to be an incorrect implementation of nonfatal(), which causes die() to return an error to callers instead of die()ing as required by PMS in >=EAPI4 The enclosed patch implements the following changes: bin/isolated-functions.sh: __helpers_die(): rewire to only use die() when actually die()ing; use annotated ewarn instead of unceremoneously dumping to stderr when deciding not to die after all bin/isolated-functions.sh: die(): remove code preventing death from actually occuring when [[ ${PORTAGE_NONFATAL} -eq 1 ]] bin/isolated-functions.sh: __assert_sigpipe_ok(): use __helpers_die instead of die; also, rename to __helpers_assert_sigpipe_ok bin/helper-functinos.sh: __redirect_alloc_fd(): use __helpers_die instead of die bin/isolated-functions.sh: nonfatal: don't die when nonfatal command is issued without arguments; instead emit an error and return 1 bin/phase-helpers.sh: unpack_tar(): use the new name __helpers_assert_sigpipe_ok instead of __assert_sigpipe_ok bin/phase-helpers.sh: unpack(): likewise bin/save-ebuild-env.sh: __save_ebuild_env(): likewise doc/package/ebuild/eapi/4.docbook: Clarify some things about how nonfatal, die, assert, and helper functions relate to each other. In particular, make it clear that nonfatal is not designed to be applied to constructs other than helper function invocations. Give an example of how to use nonfatal. Document explicitly that the die() and assert() helpers are not affected by nonfatal. These changes are an attempt to address the issues identified in this thread: http://thread.gmane.org/gmane.linux.gentoo.devel/80780 In addition to the issues and solutions agreed upon in that thread, the patch makes one additional "change": it allows assert() to continue to die() regardless of PORTAGE_NONFATAL. Since assert() was already calling die(), no change to the source was neccesary to achieve this; but it is worth noting, because this is nevertheless a novel semantic change I have decided to make without consulting anybody. I am pretty darn confident that it's the right change to make, given the consensus from the aformentioned thread, however, I suppose I'm not 100% sure and some SME should approve this change -- it may require some kind of update to the PMS. (a bit off topic: IMHO, the PMS content pertaining to this entire matter could really use some editing, but I'll leave that to others to fix, since I don't very well understand the process by which PMS changes are made. If someone would like to clue me in on that, drop me a line at the address to be found at http://gmturner.com/contact; I'd be happy to take a crack at it, now that I feel like I understand the intent behind the existing language.) Reproducible: Always
Created attachment 327202 [details, diff] Fix nonfatal, die, and helper code to comport with PMS
If I understand correctly, your patch match nofatal() work with the unpack() helper, though it did not previously. PMS does _NOT_ mention that unpack failure is EAPI dependent: http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-14200012.3.3.13 If it was EAPI dependent, then it should mention it just like it does for the dosed helper.
Actually, I think your patch is correct according to the "nonfatal" documentation: http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-13000012.3.3.1 Specifically, it says: "If this results in a command being called that would normally abort the build process due to a failure (but not due to an explicit die or assert call), instead a non-zero exit status shall be returned."
I see two issues with the patch: 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper, because there's no way for the calling code to recover if it fails 2) __helpers_assert_sigpipe_ok should probably break out of the loop on the first error encountered, since otherwise it could call __helpers_die "$@" repeatedly
(In reply to comment #4) > I see two issues with the patch: > > 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper, > because there's no way for the calling code to recover if it fails Really? I thought it only ever got called by regular old top-level bash before any forking or otherwise interesting stuff happened? But the multiprocessing code makes me dyslexic and stupid, I'm happy to take your word for it. I'll revert for now. > 2) __helpers_assert_sigpipe_ok should probably break out of the loop on the > first error encountered, since otherwise it could call __helpers_die "$@" > repeatedly Haha, that's pretty gross; thanks for noticing -- not-very-well-tested code obviously. Glad that didn't go in and make everyone's eyes bleed for a year or two :)
(In reply to comment #5) > (In reply to comment #4) > > I see two issues with the patch: > > > > 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper, > > because there's no way for the calling code to recover if it fails > > Really? I thought it only ever got called by regular old top-level bash > before any forking or otherwise interesting stuff happened? Notice that the retrun value from __redirect_alloc_fd is not checked by __multijob_init though, so it assume success, and if it doesn't die when it's supposed to then the multijob stuff that follows will fail horribly. In practice, it's not supposed to fail. The die is just an assertion so it fails early in those rare cases when it should.
I've found more helpers with misconfigured die()... working on it, but may take a few days to get everything cleaned up and tested. I also want to confirm that changing the helpers to return errors doesn't break too many existing EAPI0-3 ebuilds/eclasses too badly -- otherwise this bug may end up with three patches, one for portage, one for gentoo-x86, and one for prefix :(
(In reply to comment #6) > (In reply to comment #5) > > (In reply to comment #4) > > > I see two issues with the patch: > > > > > > 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper, > > > because there's no way for the calling code to recover if it fails > > > > Really? I thought it only ever got called by regular old top-level bash > > before any forking or otherwise interesting stuff happened? > > Notice that the retrun value from __redirect_alloc_fd is not checked by > __multijob_init though, so it assume success, and if it doesn't die when > it's supposed to then the multijob stuff that follows will fail horribly. In > practice, it's not supposed to fail. The die is just an assertion so it > fails early in those rare cases when it should. ATM it does manage to fail on cygwin, every time :) Hardly portage's fault, though -- it's effectively a kernel bug. Anyhow I may look into this once more after I finish the new pile of work I've just identified... as of now, my branch for this bug reverts this change, so unless I can figure out something better to do between now and then, it will squash into my next spin.
Sorry life got in the way a bit there... returning to this presently. I should really publish my portage tree somewhere.
Created attachment 329324 [details, diff] Fix nonfatal, die, and helper code to comport with PMS Sorry, this time life /really/ got in the way; I've been unexpectedly plunged into systems-administrative hell and expect to remain there for at least another day, and possibly considerably longer, depending on circumstances beyond my control. I applied for an account on the gcc compile farm, hoping I could continue this work there, but so far nobody has gotten back to me; it's been several days so it looks like I may have to figure out how to nag them about it via some other medium, to get my request looked at. Anyhow, here is a dump of what was in my tree, so that it can at least get reviewed while I sort out my technical difficulties. Please bear in mind that much of this code has never been tested; it probably shouldn't be committed without some testing even if it "looks good." Once I get my systems back in order, I'll definitely pick up where I left off and make sure this doesn't break anything.
Created attachment 335606 [details, diff] fix nonfatal, die, and helper-function code to comport with PMS Resync with 2.2.0_alpha150 After neglecting this for a long while, seems as good a time as any to start wrapping this up. This naively updates my previous patch to apply against the most recent portage in gentoo-x86 without worrying (yet) about additional incorrect die'ing that may have cropped up in the interim (but stay tuned for that...). Henceforth, I'll be keeping the latest incarnation of this patchset in my main workstation's portage, which should give it a pretty extensive workout (I have an -e world or three in my near future), so, a bit more time for this, please, but once I'm satisfied everything is in order, I'll declare this as unofficially-but-so-say-I kosher and recommend upstream application.
Created attachment 339754 [details, diff] Updated version of patch for 2.2.0_alpha163 This updates my patch to apply against 2.2.0_alpha163 (specifically, against the prepared sources in /var/tmp/portage, sorry, ideally I'd be working against git, but I'm lazy). It also modifies several new "die"'s that have cropped up in the interim; some of them are left alone, however, for various technical reasons (see comments in the patch). So far, this has been working fine in my portage. I'd be reasonably comfortable with it going upstream at this point. Alternatively, it can wait -- it could benefit from another big correctness audit by now -- although of course the complexity and size of the patch is only going to grow the longer this bug languishes.
For cases of incorrect usage, I think it's probably to call die rather than __helpers_die, with the justification being that incorrect code should be fixed. Examples of this include: * nonfatal with zero arguments * unpack with zero arguments * unpack arguments that are absolute or begin with ${DISTDIR} * insopts -s argument * unused arguments for master_repositories, repository_path, etc... Also, we need to be careful not to change behavior for EAPIs prior to 4. For example, in unpack you have this change: - [[ ! -s ${srcdir}${x} ]] && die "${x} does not exist" + [[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1 This changes behavior for older EAPIs, since they used to die here instead of return 1. We need to avoid this kind of change. This following change won't work as intended: if [ $? -ne 0 ]; then + local my_result=$? echo "${my_output}" >&2 - die "$myfail" + __helpers_die "$myfail" + return ${my_result} fi In this case, you need to save my_result=$? before the if test, since otherwise the previous $? value is lost. In unpack, we can use find -exec rather than -execdir, since it should make no difference with the arguments give there (and maybe -execdir is not supported by older versions of find).
Also, I think we should continue to use die instead of s_helpers_die for commands that should never fail. For example, see the CONFIG_SHELL code inside econf. There's no reason that code should ever fail, so let's continue to use die there.
(In reply to comment #13) > Also, we need to be careful not to change behavior for EAPIs prior to 4. For > example, in unpack you have this change: This seems to apply to most or all of the __helpers_die calls inside unpack.
(In reply to comment #14) > Also, I think we should continue to use die instead of __helpers_die for > commands that should never fail. The find/chmod at the end of unpack should never fail, so I think we should use die there.
(In reply to comment #13) > For cases of incorrect usage, I think it's probably to call die rather than > __helpers_die, with the justification being that incorrect code should be > fixed. Examples of this include: > > * nonfatal with zero arguments > * unpack with zero arguments > * unpack arguments that are absolute or begin with ${DISTDIR} > * insopts -s argument > * unused arguments for master_repositories, repository_path, etc... ACK, will revise patches (eventually, I'll be out of town and unlikely to work on this for a couple of weeks, unfortunately). > Also, we need to be careful not to change behavior for EAPIs prior to 4. For > example, in unpack you have this change: > > - [[ ! -s ${srcdir}${x} ]] && die "${x} does not exist" > + [[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1 > > This changes behavior for older EAPIs, since they used to die here instead > of return 1. We need to avoid this kind of change. NACK, __helpers_die dies or doesn't, appropriately, according to EAPI. > This following change won't work as intended: > > if [ $? -ne 0 ]; then > + local my_result=$? > echo "${my_output}" >&2 > - die "$myfail" > + __helpers_die "$myfail" > + return ${my_result} > fi > > In this case, you need to save my_result=$? before the if test, since > otherwise the previous $? value is lost. ACK, will fix. > In unpack, we can use find -exec rather than -execdir, since it should make > no difference with the arguments give there (and maybe -execdir is not > supported by older versions of find). I don't remember the particulars, but I somehow convinced myself that there was a good reason for this. I'll have to investigate and get back to you on that.
(In reply to comment #16) > (In reply to comment #14) > > Also, I think we should continue to use die instead of __helpers_die for > > commands that should never fail. > > The find/chmod at the end of unpack should never fail, so I think we should > use die there. I guess my feeling about this depends on what you mean by "should." Certainly I could concoct some not-wholly-implausible scenario where it does indeed fail (i.e., ${PORTAGE_TMPDIR} is a network drive or virtualized filesystem that rejects the permissions change; I could see this happening pretty readily in PREFIX portage). But perhaps you mean by "should" that such a failure is simply too problematic to let the ebuild continue? We'd be second-guessing the PMS a bit, I guess, but I'm not opposed to that, and we've already crossed that rubicon, anyhow. The question may be fairly academic, in practice, but maybe that's "famous last words..." I'll try to take a look at the code in question and spend a few moments contemplating how it's being used in practice, and get back with some kind of informed opinion on the matter. Do bear in mind my NACK, above, which impacts all such questions -- unless I'm very mistaken, __helpers_die() should be as good as die() for the newer EAPI's; if not, I think that's a __helpers_die() bug, not necessarily a helper bug.
(In reply to comment #13) > Also, we need to be careful not to change behavior for EAPIs prior to 4. For > example, in unpack you have this change: > > - [[ ! -s ${srcdir}${x} ]] && die "${x} does not exist" > + [[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1 > > This changes behavior for older EAPIs, since they used to die here instead > of return 1. We need to avoid this kind of change. Ah, shit. Ignore my NACK above, I see now, that I didn't read your comment carefully enough, and you meant exactly the converse of what I thought you meant. Now that I do understand, though, I'm a bit surprised, inasmuch as I thought the the PMS changes /required/ us to implement certain behavioral changes to existing EAPI behavior. Which is /not/ to say that I disagree. Just that I never quite grokked that this was a priority. I can very well see how this could be a problem, for example, when upgrading ancient systems to new portages. I think, given that this was a "hidden" (to me) requirement I never really had in mind when I wrote these patches, I really ought to do a complete review, bearing in mind your comments above, but also just thinking generally about what kind of EAPI behavior changes I might be inadvertently causing. So plenty of work for me to do now :) I'm back from my trip around the middle of March. Given that that's tax time I may have trouble jumping right on it so it seems this bug may need to languish for a while yet. But please be assured I haven't forgotten or abandoned it. In the meanwhile, I'll keep using these patches locally as a means to smoke out any bugs that may be lingering therein.
(In reply to comment #18) > But perhaps you mean by "should" that such a failure is simply too > problematic to let the ebuild continue? We'd be second-guessing the PMS a > bit, I guess, but I'm not opposed to that, and we've already crossed that > rubicon, anyhow. PMS says "normally abort": nonfatal Executes the remainder of its arguments as a command, preserving the exit status. If thisvresults in a command being called that would normally abort the build process due to a failure (but not due to an explicit die or assert call), instead a non-zero exit status shall be returned. Only in EAPIs listed in table 11.7 as supporting nonfatal. We're free to interpret "normally abort" however it makes sense within the context of any given helper. > Do bear in mind my NACK, above, which impacts all such questions -- unless > I'm very mistaken, __helpers_die() should be as good as die() for the newer > EAPI's; if not, I think that's a __helpers_die() bug, not necessarily a > helper bug. It's fine for the new EAPIs. The problem is with the older EAPIs. We can add something like a --die-for-older-eapis parameter to __helers_die, which will cause it to die for older EAPIs when we want it to.
I'm too lazy/busy to bother anymore. Tons has changed, this has languished for ages -- I don't even know if this is really a bug anymore. Feel free to reopen if still valid and you want to pick up where I left off.