Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 439356 - sys-apps/portage: "nonfatal die()" doesn't
Summary: sys-apps/portage: "nonfatal die()" doesn't
Status: RESOLVED WONTFIX
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All All
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-23 02:18 UTC by Greg Turner
Modified: 2014-07-03 18:57 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
Fix nonfatal, die, and helper code to comport with PMS (portage_fix_nonfatal_die.patch,7.75 KB, patch)
2012-10-23 02:20 UTC, Greg Turner
Details | Diff
Fix nonfatal, die, and helper code to comport with PMS (portage_fix_nonfatal_die_spin2.patch,14.73 KB, patch)
2012-11-12 12:06 UTC, Greg Turner
Details | Diff
fix nonfatal, die, and helper-function code to comport with PMS (diedoesnt-alpha150.patch,15.20 KB, patch)
2013-01-14 13:31 UTC, Greg Turner
Details | Diff
Updated version of patch for 2.2.0_alpha163 (diedoesnt-alpha163.patch,17.30 KB, patch)
2013-02-22 21:58 UTC, Greg Turner
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Turner 2012-10-23 02:18:48 UTC
Fix bugs in __helpers_die() and nonfatal

Amongst other issues, this resolves what has been determined to be an incorrect implementation of nonfatal(), which causes die() to return an error to callers instead of die()ing as required by PMS in >=EAPI4

The enclosed patch implements the following changes:

  bin/isolated-functions.sh: __helpers_die(): rewire to only
  use die() when actually die()ing; use annotated ewarn instead
  of unceremoneously dumping to stderr when deciding not to die
  after all

  bin/isolated-functions.sh: die(): remove code preventing death
  from actually occuring when [[ ${PORTAGE_NONFATAL} -eq 1 ]]

  bin/isolated-functions.sh: __assert_sigpipe_ok(): use
  __helpers_die instead of die; also, rename to
  __helpers_assert_sigpipe_ok

  bin/helper-functinos.sh: __redirect_alloc_fd(): use
  __helpers_die instead of die

  bin/isolated-functions.sh: nonfatal: don't die when nonfatal
  command is issued without arguments; instead emit an error
  and return 1

  bin/phase-helpers.sh: unpack_tar(): use the new name
  __helpers_assert_sigpipe_ok instead of
  __assert_sigpipe_ok

  bin/phase-helpers.sh: unpack(): likewise

  bin/save-ebuild-env.sh: __save_ebuild_env(): likewise

  doc/package/ebuild/eapi/4.docbook: Clarify some things about
  how nonfatal, die, assert, and helper functions relate to
  each other.  In particular, make it clear that nonfatal is
  not designed to be applied to constructs other than helper
  function invocations.  Give an example of how to use
  nonfatal.  Document explicitly that the die() and
  assert() helpers are not affected by nonfatal.

These changes are an attempt to address the issues identified
in this thread:

  http://thread.gmane.org/gmane.linux.gentoo.devel/80780

In addition to the issues and solutions agreed upon in that
thread, the patch makes one additional "change": it allows
assert() to continue to die() regardless of PORTAGE_NONFATAL.
Since assert() was already calling die(), no change to the
source was neccesary to achieve this; but it is worth noting,
because this is nevertheless a novel semantic change I have
decided to make without consulting anybody.  I am pretty
darn confident that it's the right change to make, given the
consensus from the aformentioned thread, however, I suppose
I'm not 100% sure and some SME should approve this change --
it may require some kind of update to the PMS.

(a bit off topic: IMHO, the PMS content pertaining to this
entire matter could really use some editing, but I'll leave
that to others to fix, since I don't very well understand
the process by which PMS changes are made.  If someone would
like to clue me in on that, drop me a line at the address
to be found at http://gmturner.com/contact; I'd be happy to
take a crack at it, now that I feel like I understand the
intent behind the existing language.)


Reproducible: Always
Comment 1 Greg Turner 2012-10-23 02:20:17 UTC
Created attachment 327202 [details, diff]
Fix nonfatal, die, and helper code to comport with PMS
Comment 2 Zac Medico gentoo-dev 2012-10-23 03:08:58 UTC
If I understand correctly, your patch match nofatal() work with the unpack() helper, though it did not previously. PMS does _NOT_ mention that unpack failure is EAPI dependent:

  http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-14200012.3.3.13

If it was EAPI dependent, then it should mention it just like it does for the dosed helper.
Comment 3 Zac Medico gentoo-dev 2012-10-23 03:22:33 UTC
Actually, I think your patch is correct according to the "nonfatal" documentation:

  http://dev.gentoo.org/~ulm/pms/4/pms.html#x1-13000012.3.3.1

Specifically, it says:

  "If this results in a command being called that would normally abort the build process due to a failure (but not due to an explicit die or assert call), instead a non-zero exit status shall be returned."
Comment 4 Zac Medico gentoo-dev 2012-10-23 03:30:32 UTC
I see two issues with the patch:

1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper, because there's no way for the calling code to recover if it fails

2) __helpers_assert_sigpipe_ok should probably break out of the loop on the first error encountered, since otherwise it could call __helpers_die "$@" repeatedly
Comment 5 Greg Turner 2012-10-23 08:14:26 UTC
(In reply to comment #4)
> I see two issues with the patch:
> 
> 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper,
> because there's no way for the calling code to recover if it fails

Really?  I thought it only ever got called by regular old top-level bash before any forking or otherwise interesting stuff happened?  But the multiprocessing code makes me dyslexic and stupid, I'm happy to take your word for it.  I'll revert for now.

> 2) __helpers_assert_sigpipe_ok should probably break out of the loop on the
> first error encountered, since otherwise it could call __helpers_die "$@"
> repeatedly

Haha, that's pretty gross; thanks for noticing -- not-very-well-tested code obviously.  Glad that didn't go in and make everyone's eyes bleed for a year or two :)
Comment 6 Zac Medico gentoo-dev 2012-10-23 15:18:01 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > I see two issues with the patch:
> > 
> > 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper,
> > because there's no way for the calling code to recover if it fails
> 
> Really?  I thought it only ever got called by regular old top-level bash
> before any forking or otherwise interesting stuff happened?

Notice that the retrun value from __redirect_alloc_fd is not checked by __multijob_init though, so it assume success, and if it doesn't die when it's supposed to then the multijob stuff that follows will fail horribly. In practice, it's not supposed to fail. The die is just an assertion so it fails early in those rare cases when it should.
Comment 7 Greg Turner 2012-10-23 19:08:54 UTC
I've found more helpers with misconfigured die()... working on it, but may take a few days to get everything cleaned up and tested.

I also want to confirm that changing the helpers to return errors doesn't break too many existing EAPI0-3 ebuilds/eclasses too badly -- otherwise this bug may end up with three patches, one for portage, one for gentoo-x86, and one for prefix :(
Comment 8 Greg Turner 2012-10-23 19:20:41 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > (In reply to comment #4)
> > > I see two issues with the patch:
> > > 
> > > 1) __redirect_alloc_fd needs to be fatal regardless of the nonfatal helper,
> > > because there's no way for the calling code to recover if it fails
> > 
> > Really?  I thought it only ever got called by regular old top-level bash
> > before any forking or otherwise interesting stuff happened?
> 
> Notice that the retrun value from __redirect_alloc_fd is not checked by
> __multijob_init though, so it assume success, and if it doesn't die when
> it's supposed to then the multijob stuff that follows will fail horribly. In
> practice, it's not supposed to fail. The die is just an assertion so it
> fails early in those rare cases when it should.

ATM it does manage to fail on cygwin, every time :)  Hardly portage's fault, though -- it's effectively a kernel bug.  Anyhow I may look into this once more after I finish the new pile of work I've just identified... as of now, my branch for this bug reverts this change, so unless I can figure out something better to do between now and then, it will squash into my next spin.
Comment 9 Greg Turner 2012-11-03 07:18:55 UTC
Sorry life got in the way a bit there... returning to this presently.  I should really publish my portage tree somewhere.
Comment 10 Greg Turner 2012-11-12 12:06:29 UTC
Created attachment 329324 [details, diff]
Fix nonfatal, die, and helper code to comport with PMS

Sorry, this time life /really/ got in the way; I've been unexpectedly plunged into systems-administrative hell and expect to remain there for at least another day, and possibly considerably longer, depending on circumstances beyond my control.

I applied for an account on the gcc compile farm, hoping I could continue this work there, but so far nobody has gotten back to me; it's been several days so it looks like I may have to figure out how to nag them about it via some other medium, to get my request looked at.

Anyhow, here is a dump of what was in my tree, so that it can at least get reviewed while I sort out my technical difficulties.  Please bear in mind that much of this code has never been tested; it probably shouldn't be committed without some testing even if it "looks good."

Once I get my systems back in order, I'll definitely pick up where I left off and make sure this doesn't break anything.
Comment 11 Greg Turner 2013-01-14 13:31:23 UTC
Created attachment 335606 [details, diff]
fix nonfatal, die, and helper-function code to comport with PMS

Resync with 2.2.0_alpha150

After neglecting this for a long while, seems as good a time as any to start wrapping this up.  This naively updates my previous patch to apply against the most recent portage in gentoo-x86 without worrying (yet) about additional incorrect die'ing that may have cropped up in the interim (but stay tuned for that...).

Henceforth, I'll be keeping the latest incarnation of this patchset in my main workstation's portage, which should give it a pretty extensive workout (I have an -e world or three in my near future), so, a bit more time for this, please, but once I'm satisfied everything is in order, I'll declare this as unofficially-but-so-say-I kosher and recommend upstream application.
Comment 12 Greg Turner 2013-02-22 21:58:41 UTC
Created attachment 339754 [details, diff]
Updated version of patch for 2.2.0_alpha163

This updates my patch to apply against 2.2.0_alpha163 (specifically, against the prepared sources in /var/tmp/portage, sorry, ideally I'd be working against git, but I'm lazy).

It also modifies several new "die"'s that have cropped up in the interim; some of them are left alone, however, for various technical reasons (see comments in the patch).

So far, this has been working fine in my portage.  I'd be reasonably comfortable with it going upstream at this point.  Alternatively, it can wait -- it could benefit from another big correctness audit by now -- although of course the complexity and size of the patch is only going to grow the longer this bug languishes.
Comment 13 Zac Medico gentoo-dev 2013-02-25 00:29:05 UTC
For cases of incorrect usage, I think it's probably to call die rather than __helpers_die, with the justification being that incorrect code should be fixed. Examples of this include:

  * nonfatal with zero arguments
  * unpack with zero arguments
  * unpack arguments that are absolute or begin with ${DISTDIR}
  * insopts -s argument
  * unused arguments for master_repositories, repository_path, etc...

Also, we need to be careful not to change behavior for EAPIs prior to 4. For example, in unpack you have this change:

-	[[ ! -s ${srcdir}${x} ]] && die "${x} does not exist"
+	[[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1

This changes behavior for older EAPIs, since they used to die here instead of return 1. We need to avoid this kind of change.

This following change won't work as intended:

 				if [ $? -ne 0 ]; then
+					local my_result=$?
 					echo "${my_output}" >&2
-					die "$myfail"
+					__helpers_die "$myfail"
+					return ${my_result}
 				fi

In this case, you need to save my_result=$? before the if test, since otherwise the previous $? value is lost.

In unpack, we can use find -exec rather than -execdir, since it should make no difference with the arguments give there (and maybe -execdir is not supported by older versions of find).
Comment 14 Zac Medico gentoo-dev 2013-02-25 00:32:17 UTC
Also, I think we should continue to use die instead of s_helpers_die for commands that should never fail. For example, see the CONFIG_SHELL code inside econf. There's no reason that code should ever fail, so let's continue to use die there.
Comment 15 Zac Medico gentoo-dev 2013-02-25 00:35:36 UTC
(In reply to comment #13)
> Also, we need to be careful not to change behavior for EAPIs prior to 4. For
> example, in unpack you have this change:

This seems to apply to most or all of the __helpers_die calls inside unpack.
Comment 16 Zac Medico gentoo-dev 2013-02-25 00:48:09 UTC
(In reply to comment #14)
> Also, I think we should continue to use die instead of __helpers_die for
> commands that should never fail.

The find/chmod at the end of unpack should never fail, so I think we should use die there.
Comment 17 Greg Turner 2013-02-26 00:51:22 UTC
(In reply to comment #13)
> For cases of incorrect usage, I think it's probably to call die rather than
> __helpers_die, with the justification being that incorrect code should be
> fixed. Examples of this include:
> 
>   * nonfatal with zero arguments
>   * unpack with zero arguments
>   * unpack arguments that are absolute or begin with ${DISTDIR}
>   * insopts -s argument
>   * unused arguments for master_repositories, repository_path, etc...

ACK, will revise patches (eventually, I'll be out of town and unlikely to work on this for a couple of weeks, unfortunately).

> Also, we need to be careful not to change behavior for EAPIs prior to 4. For
> example, in unpack you have this change:
> 
> -	[[ ! -s ${srcdir}${x} ]] && die "${x} does not exist"
> +	[[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1
>
> This changes behavior for older EAPIs, since they used to die here instead
> of return 1. We need to avoid this kind of change.

NACK, __helpers_die dies or doesn't, appropriately, according to EAPI.

> This following change won't work as intended:
> 
>  				if [ $? -ne 0 ]; then
> +					local my_result=$?
>  					echo "${my_output}" >&2
> -					die "$myfail"
> +					__helpers_die "$myfail"
> +					return ${my_result}
>  				fi
> 
> In this case, you need to save my_result=$? before the if test, since
> otherwise the previous $? value is lost.

ACK, will fix.

> In unpack, we can use find -exec rather than -execdir, since it should make
> no difference with the arguments give there (and maybe -execdir is not
> supported by older versions of find).

I don't remember the particulars, but I somehow convinced myself that there was a good reason for this.  I'll have to investigate and get back to you on that.
Comment 18 Greg Turner 2013-02-26 01:07:20 UTC
(In reply to comment #16)
> (In reply to comment #14)
> > Also, I think we should continue to use die instead of __helpers_die for
> > commands that should never fail.
> 
> The find/chmod at the end of unpack should never fail, so I think we should
> use die there.

I guess my feeling about this depends on what you mean by "should."  Certainly I could concoct some not-wholly-implausible scenario where it does indeed fail (i.e., ${PORTAGE_TMPDIR} is a network drive or virtualized filesystem that rejects the permissions change; I could see this happening pretty readily in PREFIX portage).

But perhaps you mean by "should" that such a failure is simply too problematic to let the ebuild continue?  We'd be second-guessing the PMS a bit, I guess, but I'm not opposed to that, and we've already crossed that rubicon, anyhow.

The question may be fairly academic, in practice, but maybe that's "famous last words..."  I'll try to take a look at the code in question and spend a few moments contemplating how it's being used in practice, and get back with some kind of informed opinion on the matter.

Do bear in mind my NACK, above, which impacts all such questions -- unless I'm very mistaken, __helpers_die() should be as good as die() for the newer EAPI's; if not, I think that's a __helpers_die() bug, not necessarily a helper bug.
Comment 19 Greg Turner 2013-02-26 01:24:45 UTC
(In reply to comment #13)
> Also, we need to be careful not to change behavior for EAPIs prior to 4. For
> example, in unpack you have this change:
> 
> -	[[ ! -s ${srcdir}${x} ]] && die "${x} does not exist"
> +	[[ -s ${srcdir}${x} ]] || __helpers_die "${x} does not exist" || return 1
> 
> This changes behavior for older EAPIs, since they used to die here instead
> of return 1. We need to avoid this kind of change.

Ah, shit.

Ignore my NACK above, I see now, that I didn't read your comment carefully enough, and you meant exactly the converse of what I thought you meant.

Now that I do understand, though, I'm a bit surprised, inasmuch as I thought the the PMS changes /required/ us to implement certain behavioral changes to existing EAPI behavior.

Which is /not/ to say that I disagree.  Just that I never quite grokked that this was a priority.  I can very well see how this could be a problem, for example, when upgrading ancient systems to new portages.

I think, given that this was a "hidden" (to me) requirement I never really had in mind when I wrote these patches, I really ought to do a complete review, bearing in mind your comments above, but also just thinking generally about what kind of EAPI behavior changes I might be inadvertently causing.

So plenty of work for me to do now :)  I'm back from my trip around the middle of March.  Given that that's tax time I may have trouble jumping right on it so it seems this bug may need to languish for a while yet.  But please be assured I haven't forgotten or abandoned it.  In the meanwhile, I'll keep using these patches locally as a means to smoke out any bugs that may be lingering therein.
Comment 20 Zac Medico gentoo-dev 2013-02-26 01:27:28 UTC
(In reply to comment #18)
> But perhaps you mean by "should" that such a failure is simply too
> problematic to let the ebuild continue?  We'd be second-guessing the PMS a
> bit, I guess, but I'm not opposed to that, and we've already crossed that
> rubicon, anyhow.

PMS says "normally abort":

nonfatal Executes the remainder of its arguments as a command, preserving the exit status. If thisvresults in a command being called that would normally abort the build process due to a failure (but not due to an explicit die or assert call), instead a non-zero exit status shall be returned. Only in EAPIs listed in table 11.7 as supporting nonfatal.

We're free to interpret "normally abort" however it makes sense within the context of any given helper.

> Do bear in mind my NACK, above, which impacts all such questions -- unless
> I'm very mistaken, __helpers_die() should be as good as die() for the newer
> EAPI's; if not, I think that's a __helpers_die() bug, not necessarily a
> helper bug.

It's fine for the new EAPIs. The problem is with the older EAPIs. We can add something like a --die-for-older-eapis parameter to __helers_die, which will cause it to die for older EAPIs when we want it to.
Comment 21 Greg Turner 2014-07-03 18:57:27 UTC
I'm too lazy/busy to bother anymore.  Tons has changed, this has languished for ages -- I don't even know if this is really a bug anymore.

Feel free to reopen if still valid and you want to pick up where I left off.