Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 924772 - Feature request: portage currently lacks functionality to warn/block upgrades when binhost packages not yet available
Summary: Feature request: portage currently lacks functionality to warn/block upgrade...
Status: IN_PROGRESS
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: AMD64 Linux
: Normal normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: PullRequest
Depends on: 463964
Blocks: 240187
  Show dependency tree
 
Reported: 2024-02-17 13:46 UTC by Adrian Bassett
Modified: 2024-04-04 18:27 UTC (History)
7 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Bassett 2024-02-17 13:46:17 UTC
I'm currently exploring gentoo's recent expansion into providing a distro-level upstream binhost, using amd64 on older hardware, so not -v3 :-)

A problem I'm finding is that when applications updates are available in portage there is (not surprisingly) a delay before new binaries are re-built and available for download from the binhost.

This leads to the situation where portage offers source-based upgrades for applications where ordinarily pre-built binaries would be offered.

This is clearly not a big deal for small packages, but the recent poppler/boost/libreoffice stabilisations would have meant ~10 hours of building on the hardware in question, on the one hand, and counteracts the reasons for using the binhost in the first place, on the other.

So, the request is:  can consideration be given to extending portage to allow for warning/blocking where application upgrades are available in portage but for which binaries are not yet available from the configured mirror(s)?

(I am aware of the -g/-G options, obviously, but these don't seem to cover the use-case described?)

Thanks (and hoping I haven't missed something obvious...)


Reproducible: Always
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-18 06:27:17 UTC
This is kind of bug 463964 but maybe we should keep it separate with a depends-on.
Comment 2 Adrian Bassett 2024-02-18 09:44:07 UTC
(In reply to Sam James from comment #1)
> This is kind of bug 463964 but maybe we should keep it separate with a
> depends-on.

Thanks, I hadn't previously seen that one
Comment 3 Adrian Bassett 2024-02-18 10:08:08 UTC
(In reply to Adrian Bassett from comment #2)
> (In reply to Sam James from comment #1)
> > This is kind of bug 463964 but maybe we should keep it separate with a
> > depends-on.
> 
> Thanks, I hadn't previously seen that one
... but would add that I was not thinking of granularity beyond use/delay using binhost pkg v. compile locally if package is either not supported on binhost or there is a USE flag discrepancy, although those two effectively amount to the same thing.
Comment 4 Zac Medico gentoo-dev 2024-02-19 01:15:38 UTC
A possibly useful mitigation for this issue would be to create a sync configuration that only updates to the latest revision of the gentoo ebuild repository that the binhost has finished processing. For example, it could be implemented using a git branch that the binhost infrastructure is responsible for updating when it has finished processing a particular revision of the gentoo master branch.

How does that sound @dilfridge?
Comment 5 Zac Medico gentoo-dev 2024-02-19 02:39:39 UTC
If we have a gentoo git branch that the binhost infrastructure maintains to match the state of mirrored binhosts, then we can add something about how to configure git sync from the binhost's branch near the binrepos.conf instructions:

https://wiki.gentoo.org/wiki/Gentoo_Binary_Host_Quickstart#binrepos.conf

Ideally, the git branch should be published at about the same time as the binhost updates are scheduled to arrive on mirrors. It's best if binhost updates are as atomic as possible, in order to minimize user exposure to inconsistent states that could trigger dependency conflicts.
Comment 6 Zac Medico gentoo-dev 2024-02-19 02:41:48 UTC
Actually, we can implement the binhost branch on the client side if the binhost Packages index file contains a header for git commit hash from the gentoo repo.
Comment 7 Zac Medico gentoo-dev 2024-02-19 02:56:06 UTC
The gentoo git commit hash in the Packages header might be implemented in portage as a sort of intentional information leak (like the other information it leaks as reported in bug 912648).
Comment 8 Zac Medico gentoo-dev 2024-02-19 03:05:14 UTC
An advantage of having a public git sync branch for this is that users can use sync-depth = 1 and it will fetch the correct revision. If we use a Packages header containing a git commit hash to implement the consistency on the client side, then a larger sync-depth will be required.

An advantage of implementing the consistency on the client side is that it removes the burden of synchronizing the public git sync branch update with the mirroring of the corresponding binhost updates.
Comment 9 Zac Medico gentoo-dev 2024-02-19 07:30:16 UTC
Another advantage of implementing the consistency on the client side is that we are practically guaranteed to find the commit hash referenced by binhost Packages file, without ever needing to retry sync of either the binhost repo or ebuild repo.

If we use a public git sync branch for binhost users, there's a race to achieve a consistent state, so in theory we might need to retry if inconsistent state is detected. However, we should be able to sync the binhost repo just once, and then localize any retry in the ebuild repo git sync, and it should never have to retry more than once unless something has gone wrong and prevented updates to the public git sync branch for binhost users.
Comment 10 Zac Medico gentoo-dev 2024-02-25 22:06:38 UTC
(In reply to Zac Medico from comment #7)
> The gentoo git commit hash in the Packages header might be implemented in
> portage as a sort of intentional information leak (like the other
> information it leaks as reported in bug 912648).

I suppose we could represent this as a json object that maps repo name to commit hash, and we can limit the repos it exposes to those for which packages exist in the Packages file.
Comment 11 Zac Medico gentoo-dev 2024-02-26 00:26:56 UTC
This binhost infrastructure does not necessarily need to use git sync in order for us to get the corresponding git commit, since we parse metadata/timestamp.commit for rsync sync:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5

commit 0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5
Author: Manuel Rüger <mrueg@gentoo.org>
Date:   2017-06-16 16:48:34 +0200

    emerge: Add head commit per repo to --info
    
    This adds the following to emerge --info output for git and rsync based
    repositories:
    
    Head commit of repository gentoo: 0518b330edac963f54f98df33391b8e7b9eaee4c
    
    Reviewed-By: Zac Medico <zmedico@gentoo.org>
Comment 12 Zac Medico gentoo-dev 2024-02-26 01:31:08 UTC
I suppose we could sample the source repository git commit at the time that EbuildBinpkg injects it into the binarytree here:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=89df7574a355a245e19ba297c3685997eec6bbbe

However, the git commit would then be incorrect if the repository was synced after the build started, so it's better if we make EbuildBuild record the git commit hash in the ${PORTAGE_BUILDDIR}/build-info directory where it also keeps a copy of the ebuild. I suppose we should also include commit hashes for any parent repositories that eclasses were inherited from.
Comment 13 Larry the Git Cow gentoo-dev 2024-03-09 22:17:17 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=eea598a20b2db5ecbe3975dc96885f529ae54c1c

commit eea598a20b2db5ecbe3975dc96885f529ae54c1c
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-03-09 21:22:35 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-03-09 21:22:35 +0000

    __dyn_install: Record REPO_REVISIONS in build-info
    
    Record REPO_REVISIONS as a json object that maps repo name to
    revision for an ebuild's source repository and any repositories
    that eclasses were inherited from:
    
    $ cat /var/tmp/portage/sys-apps/portage-3.0.63/build-info/REPO_REVISIONS
    {"gentoo": "34875e30e73e33d3597d1101cdf97dc22729b268"}
    
    Ultimately the intention is to expose this information in binhost
    metadata so that clients can select consistent revisions of source
    repositories.
    
    Bug: https://bugs.gentoo.org/924772
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/phase-functions.sh                             |  1 +
 lib/_emerge/EbuildPhase.py                         | 46 ++++++++++++++++++++++
 .../package/ebuild/_config/special_env_vars.py     |  1 +
 3 files changed, 48 insertions(+)
Comment 14 Zac Medico gentoo-dev 2024-03-10 00:57:34 UTC
I'm thinking about how to merge REPO_REVISIONS values from individual packages into a global REPO_REVISIONS value for the packages index. When we do this, we need to ensure that newer revisions are not replaced with older revisions, for example if a package built against and older sync finishes building after other packages built against a newer sync have already merged their REPO_REVISIONS into the global REPO_REVISIONS value.

One way to do this is to only merge revisions into the global REPO_REVISIONS value if they correspond to the currently sync repository state, which will serve to filter out older values.
Comment 15 Zac Medico gentoo-dev 2024-03-10 01:43:24 UTC
(In reply to Zac Medico from comment #14)
> One way to do this is to only merge revisions into the global REPO_REVISIONS
> value if they correspond to the currently sync repository state, which will
> serve to filter out older values.

The rsync sync-rcu option makes this a little tricky because running processes hold references to older snapshots. We can detect this case by checking if the repo location and user_location still refer to the same path.
Comment 16 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-10 01:46:01 UTC
(In reply to Zac Medico from comment #15)

FWIW, I suspect this option isn't very popular at the moment (which is a shame, as it's great). Not that we should ignore it, ofc.
Comment 17 Zac Medico gentoo-dev 2024-03-10 21:23:51 UTC
I'm thinking about adding a log of recently synced repo revisions that we can use as a database to ensure that the binhost's exported REPO_REVISIONS always progress forward and never backward.