Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 487478 - sys-apps/portage: running econf in parallel will race to update config.{sub,guess}
Summary: sys-apps/portage: running econf in parallel will race to update config.{sub,g...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core - Configuration (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
: 516892 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-10-09 21:29 UTC by Greg Turner
Modified: 2014-07-11 15:29 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Turner 2013-10-09 21:29:25 UTC
I have never seen this to be a problem in practice, but in theory, multibuild.eclass-based ebuilds with parallelized econf invocations seem to race each other to replace config.{guess,sub} and the shebang in configure during econf invocations.

In a paralleled ebuilds, we really should never be mucking around with the contents of ${S} during src_configure at all.

There are obvious 99% reliable heuristic ways to move those steps into, i.e., a post-src_prepare subphase, yet, to fix this in a 100% backward-compatible way, is very hard: portage can't know which thing named "configure" in ${WORKDIR}, if any, will wind up being run via econf, until it happens, unless we require this to be declaratively specified, somehow, ahead of time, which sounds horrible.

Assuming this is, indeed, a real race-condition, what is the prognosis, given that in practice it's clearly not a huge problem?  Perhaps the existing order of operations is salvageable using a locking stratagem:

Imagine a global "${T}"/_econf_configuring_" directory-as-lock-file (like: http://mywiki.wooledge.org/BashFAQ/045) were used to serialize the first three steps in econf() after checking the phase and eqawarning about wrong-phase invocations.  Specifically:

  o The first line of configure is checked for a shebang prefixification and modified if appropriate

  o All of ${WORKDIR} is iterated through, replacing any config.{sub,guess} with preferred versions

  o depending on EAPI, the results of ./configure --help are cached in a variable.

After that, it doesn't look like econf() touches configure, or any file, for that matter, until configure is actually invoked.  So if those first three steps were globally serialized by the lock-directory, the only potential race left would be the prospect of invocation racing with the lock-protected portion of econf() in some other process.  Let's consider each step:

 o Shebang-replacement is a non-issue: the test for a bad shebang will always be negative, the second time any particular script is run by econf, barring some maintainer-mode issue causing ./configure to be regenerated.

 o config.{sub,guess}-replacement appears still to be a problem, and 

 o $(./configure --help) caching while simultaneously running the script should not be a problem.

The final remaining issue of config.{sub,guess}-replacement potentially racing with the configure script's use of those files could be solved by a second global marker, laid down in ${T}, while still holding the putative global dir-lock, after the config.{sub,guess}-replacement were completed.  Let's say we just touch "${T}"/_portage_econfed_something_ (we'd also want to be sure to wipe any remnants of that file and our lock-dir out before ebuild intiated any src_configure phase).

While holding the putative global dir-lock, all that would be required would be to look for that marker in "${T}" and skip the config.{sub,guess} replacement if it's there.  For large ${WORKDIR}s this could actually have a beneficial side effect of making econf significantly faster.

With the lock-dir and config.{sub,guess} replacement marker, we now seem to have a multiprocessing-safe econf.  All the steps that potentially race with configure invocation are always completed atomically before any econf in any process racing to run the same script can start running it, and all the steps that modify "${S}" are globally serialized....

Suspected reasons this is not a frequent problem in practice are:

 o that the iteration through ${WORKDIR} for config.{sub,guess}-replacement provides enough of a delay that no process typically manages to run configure while shebang replacement is underway

 o the ability of some configure scripts to do the right thing in spite of a bad config.{sub,guess}.

Am I missing something or is this a real problem?  I'd be happy to submit patches equivalent to the above if everyone likes the approach.
Comment 1 Mike Gilbert gentoo-dev 2013-11-28 16:07:12 UTC
Good catch. I think your proposed solution makes sense. If you want to attempt an implementation, I would be happy to review/test it.
Comment 2 Greg Turner 2013-11-29 11:12:02 UTC
(In reply to Mike Gilbert from comment #1)
> Good catch. I think your proposed solution makes sense. If you want to
> attempt an implementation, I would be happy to review/test it.

Sure, I think it will be fairly easy... pretty busy though, give me a few days.
Comment 3 Sebastian Luther (few) 2013-12-02 21:06:23 UTC
Is here anything to do for dev-portage? Otherwise please reassign.
Comment 4 Mike Gilbert gentoo-dev 2013-12-02 21:24:47 UTC
(In reply to Sebastian Luther (few) from comment #3)
> Is here anything to do for dev-portage? Otherwise please reassign.

Yes; if/when Greg provides an implementation it will be a patch against the Portage source code and dev-portage will need to review it.

I'm just volunteering to do an initial review because it is an issue I would like to see fixed.
Comment 5 Greg Turner 2013-12-17 21:45:50 UTC
Forgive me thinking out loud here, but, returning to this, I think I see a bug in my proposed solution.

The problem is with the "_portage_econfed_something_" marker -- it's based on an implicit assumption (admittedly, almost always a correct one) that the first configure will occur in a parent directory of any subsequent econf invocation.

An example case where this would not be the case is that of an ebuild exploiting the in-source build feature of multibuild for multiple ABI's in parallel (pretty easy to do with multilib-minimal, for example); this would run configure in parallel from a separate BUILD_DIR for each ABI.

I can fix the bug by renaming "${T}/_portage_econfed_something_" to the (anyhow more descriptive) name "${T}/_config_sub_guess_substituted_", and keeping a list of (root?) directories where portage has already executed the recursive substitution heuristic in that file (it would still only be touched while holding the dir-lock).

This makes for a potentially considerably more complicated solution, however, as now I'm faced with questions like, whether to do directory/subdirectory containment analysis, worry about complications due to symbolic links, and so forth.

One way to avoid all that mess would be to a flat list of realpath's in there, not attempting to decide if A is a subdirectory of B at all.  An even simpler approach would be to simply do the substitution for all of ${WORKDIR} on the first invocation... although I'm not 110% sure how reliable the assumption that everything is in ${WORKDIR} actually is.

I guess, for a first crack at it, I'll go with the flat list of absolute real-paths.  Once a directory already tests as "done" I can stop iterating through sub-directories, to hopefully preserve some of the performance benefit I was hoping to realize in my original take on the problem (technically, we no longer achieve any O(# of dirs) reduction, in this revised proposal, for revisited configure directories, but, anyhow, the point of this was never to enhance performance but to prevent races).

Actually, this makes me think about a second assumption I've made in both my original and revised proposal -- that inappropriate config.{sub,guess} files do not somehow "regenerate" in-between econf invocations.  My inclination is (it would seem) to cling to that assumption, but if anyone has any reason to question it, please let me know!
Comment 6 Greg Turner 2013-12-17 21:54:26 UTC
(In reply to Greg Turner from comment #5)
> The problem is with the "_portage_econfed_something_" marker -- it's based
> on an implicit assumption (admittedly, almost always a correct one) that the
> first configure will occur in a parent directory of any subsequent econf
> invocation.

Oh, no it isn't :)

Should have read the code again -- it already uses ${WORKDIR}, not ${ECONF_SOURCE} to iterate.

So my original approach seems OK.  Sorry for the noise.

Hopefully I can churn this out pretty quickly, and do some basic testing in over the next few days, guys.  Presumably I should post my results to portage-devel as well as this bug?
Comment 7 Greg Turner 2013-12-17 22:01:27 UTC
(In reply to Greg Turner from comment #5)
> Forgive me thinking out loud here, but

[snip]

> this makes me think about a second assumption I've made in [my proposal]
> -- that inappropriate config.{sub,guess} files
> do not somehow "regenerate" in-between econf invocations.  My inclination is
> (it would seem) to cling to that assumption, but if anyone has any reason to
> question it, please let me know!

This part still stands, however.  Technically, my proposal changes the semantics of econf, insofar as, right now, econf will do the right thing even if bad config.{sub,guess} files regenerate in-between econf invocations, but with the proposed modifications, it no longer would.

If that's ever going to be a problem, then something will need to be tweaked.
Comment 8 SpanKY gentoo-dev 2013-12-17 23:15:51 UTC
it's fairly easy to do:
 - leverage `flock` and a subshell when updating the two files
 - use `cp;mv` to do an automatic rewrite

no need to hack anything else
Comment 9 Greg Turner 2013-12-18 00:51:09 UTC
(In reply to SpanKY from comment #8)
> it's fairly easy to do:
>  - leverage `flock` and a subshell when updating the two files

by this you mean config.{sub,guess}?  Note that we shuffle through these recursively in all of ${WORKDIR}...

>  - use `cp;mv` to do an automatic rewrite

here you mean the shebang replacement, correct?

> no need to hack anything else

Any existing flock dependencies in portage?  Concerned this might screw up some obscure prefix bootstrap somewhere.

This still doesn't protect against config.{sub,guess} replacement racing with the configure script itself... but..., not sure how afraid of that we should be.  What about just using cp;mv there, as well? They're sourced, after all....

Can anyone think of a reason that simply using "mv"-replacement for all the manipulations in econf wouldn't solve the entire problem?  I'm not sure I can, actually...
Comment 11 Greg Turner 2013-12-22 00:54:17 UTC
Closing, this is resolved.
Comment 12 Michał Górny archtester Gentoo Infrastructure gentoo-dev Security 2014-07-11 15:29:32 UTC
*** Bug 516892 has been marked as a duplicate of this bug. ***