Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 924772 - Feature request: portage currently lacks functionality to warn/block upgrades when binhost packages not yet available
Summary: Feature request: portage currently lacks functionality to warn/block upgrade...
Status: IN_PROGRESS
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Enhancement/Feature Requests (show other bugs)
Hardware: AMD64 Linux
: Normal normal
Assignee: Portage team
URL:
Whiteboard:
Keywords:
Depends on: 463964 932739 935697
Blocks: 240187
  Show dependency tree
 
Reported: 2024-02-17 13:46 UTC by Adrian Bassett
Modified: 2024-10-25 10:59 UTC (History)
16 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Bassett 2024-02-17 13:46:17 UTC
I'm currently exploring gentoo's recent expansion into providing a distro-level upstream binhost, using amd64 on older hardware, so not -v3 :-)

A problem I'm finding is that when applications updates are available in portage there is (not surprisingly) a delay before new binaries are re-built and available for download from the binhost.

This leads to the situation where portage offers source-based upgrades for applications where ordinarily pre-built binaries would be offered.

This is clearly not a big deal for small packages, but the recent poppler/boost/libreoffice stabilisations would have meant ~10 hours of building on the hardware in question, on the one hand, and counteracts the reasons for using the binhost in the first place, on the other.

So, the request is:  can consideration be given to extending portage to allow for warning/blocking where application upgrades are available in portage but for which binaries are not yet available from the configured mirror(s)?

(I am aware of the -g/-G options, obviously, but these don't seem to cover the use-case described?)

Thanks (and hoping I haven't missed something obvious...)


Reproducible: Always
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-02-18 06:27:17 UTC
This is kind of bug 463964 but maybe we should keep it separate with a depends-on.
Comment 2 Adrian Bassett 2024-02-18 09:44:07 UTC
(In reply to Sam James from comment #1)
> This is kind of bug 463964 but maybe we should keep it separate with a
> depends-on.

Thanks, I hadn't previously seen that one
Comment 3 Adrian Bassett 2024-02-18 10:08:08 UTC
(In reply to Adrian Bassett from comment #2)
> (In reply to Sam James from comment #1)
> > This is kind of bug 463964 but maybe we should keep it separate with a
> > depends-on.
> 
> Thanks, I hadn't previously seen that one
... but would add that I was not thinking of granularity beyond use/delay using binhost pkg v. compile locally if package is either not supported on binhost or there is a USE flag discrepancy, although those two effectively amount to the same thing.
Comment 4 Zac Medico gentoo-dev 2024-02-19 01:15:38 UTC
A possibly useful mitigation for this issue would be to create a sync configuration that only updates to the latest revision of the gentoo ebuild repository that the binhost has finished processing. For example, it could be implemented using a git branch that the binhost infrastructure is responsible for updating when it has finished processing a particular revision of the gentoo master branch.

How does that sound @dilfridge?
Comment 5 Zac Medico gentoo-dev 2024-02-19 02:39:39 UTC
If we have a gentoo git branch that the binhost infrastructure maintains to match the state of mirrored binhosts, then we can add something about how to configure git sync from the binhost's branch near the binrepos.conf instructions:

https://wiki.gentoo.org/wiki/Gentoo_Binary_Host_Quickstart#binrepos.conf

Ideally, the git branch should be published at about the same time as the binhost updates are scheduled to arrive on mirrors. It's best if binhost updates are as atomic as possible, in order to minimize user exposure to inconsistent states that could trigger dependency conflicts.
Comment 6 Zac Medico gentoo-dev 2024-02-19 02:41:48 UTC
Actually, we can implement the binhost branch on the client side if the binhost Packages index file contains a header for git commit hash from the gentoo repo.
Comment 7 Zac Medico gentoo-dev 2024-02-19 02:56:06 UTC
The gentoo git commit hash in the Packages header might be implemented in portage as a sort of intentional information leak (like the other information it leaks as reported in bug 912648).
Comment 8 Zac Medico gentoo-dev 2024-02-19 03:05:14 UTC
An advantage of having a public git sync branch for this is that users can use sync-depth = 1 and it will fetch the correct revision. If we use a Packages header containing a git commit hash to implement the consistency on the client side, then a larger sync-depth will be required.

An advantage of implementing the consistency on the client side is that it removes the burden of synchronizing the public git sync branch update with the mirroring of the corresponding binhost updates.
Comment 9 Zac Medico gentoo-dev 2024-02-19 07:30:16 UTC
Another advantage of implementing the consistency on the client side is that we are practically guaranteed to find the commit hash referenced by binhost Packages file, without ever needing to retry sync of either the binhost repo or ebuild repo.

If we use a public git sync branch for binhost users, there's a race to achieve a consistent state, so in theory we might need to retry if inconsistent state is detected. However, we should be able to sync the binhost repo just once, and then localize any retry in the ebuild repo git sync, and it should never have to retry more than once unless something has gone wrong and prevented updates to the public git sync branch for binhost users.
Comment 10 Zac Medico gentoo-dev 2024-02-25 22:06:38 UTC
(In reply to Zac Medico from comment #7)
> The gentoo git commit hash in the Packages header might be implemented in
> portage as a sort of intentional information leak (like the other
> information it leaks as reported in bug 912648).

I suppose we could represent this as a json object that maps repo name to commit hash, and we can limit the repos it exposes to those for which packages exist in the Packages file.
Comment 11 Zac Medico gentoo-dev 2024-02-26 00:26:56 UTC
This binhost infrastructure does not necessarily need to use git sync in order for us to get the corresponding git commit, since we parse metadata/timestamp.commit for rsync sync:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5

commit 0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5
Author: Manuel Rüger <mrueg@gentoo.org>
Date:   2017-06-16 16:48:34 +0200

    emerge: Add head commit per repo to --info
    
    This adds the following to emerge --info output for git and rsync based
    repositories:
    
    Head commit of repository gentoo: 0518b330edac963f54f98df33391b8e7b9eaee4c
    
    Reviewed-By: Zac Medico <zmedico@gentoo.org>
Comment 12 Zac Medico gentoo-dev 2024-02-26 01:31:08 UTC
I suppose we could sample the source repository git commit at the time that EbuildBinpkg injects it into the binarytree here:

https://gitweb.gentoo.org/proj/portage.git/commit/?id=89df7574a355a245e19ba297c3685997eec6bbbe

However, the git commit would then be incorrect if the repository was synced after the build started, so it's better if we make EbuildBuild record the git commit hash in the ${PORTAGE_BUILDDIR}/build-info directory where it also keeps a copy of the ebuild. I suppose we should also include commit hashes for any parent repositories that eclasses were inherited from.
Comment 13 Larry the Git Cow gentoo-dev 2024-03-09 22:17:17 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=eea598a20b2db5ecbe3975dc96885f529ae54c1c

commit eea598a20b2db5ecbe3975dc96885f529ae54c1c
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-03-09 21:22:35 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-03-09 21:22:35 +0000

    __dyn_install: Record REPO_REVISIONS in build-info
    
    Record REPO_REVISIONS as a json object that maps repo name to
    revision for an ebuild's source repository and any repositories
    that eclasses were inherited from:
    
    $ cat /var/tmp/portage/sys-apps/portage-3.0.63/build-info/REPO_REVISIONS
    {"gentoo": "34875e30e73e33d3597d1101cdf97dc22729b268"}
    
    Ultimately the intention is to expose this information in binhost
    metadata so that clients can select consistent revisions of source
    repositories.
    
    Bug: https://bugs.gentoo.org/924772
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/phase-functions.sh                             |  1 +
 lib/_emerge/EbuildPhase.py                         | 46 ++++++++++++++++++++++
 .../package/ebuild/_config/special_env_vars.py     |  1 +
 3 files changed, 48 insertions(+)
Comment 14 Zac Medico gentoo-dev 2024-03-10 00:57:34 UTC
I'm thinking about how to merge REPO_REVISIONS values from individual packages into a global REPO_REVISIONS value for the packages index. When we do this, we need to ensure that newer revisions are not replaced with older revisions, for example if a package built against and older sync finishes building after other packages built against a newer sync have already merged their REPO_REVISIONS into the global REPO_REVISIONS value.

One way to do this is to only merge revisions into the global REPO_REVISIONS value if they correspond to the currently sync repository state, which will serve to filter out older values.
Comment 15 Zac Medico gentoo-dev 2024-03-10 01:43:24 UTC
(In reply to Zac Medico from comment #14)
> One way to do this is to only merge revisions into the global REPO_REVISIONS
> value if they correspond to the currently sync repository state, which will
> serve to filter out older values.

The rsync sync-rcu option makes this a little tricky because running processes hold references to older snapshots. We can detect this case by checking if the repo location and user_location still refer to the same path.
Comment 16 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2024-03-10 01:46:01 UTC
(In reply to Zac Medico from comment #15)

FWIW, I suspect this option isn't very popular at the moment (which is a shame, as it's great). Not that we should ignore it, ofc.
Comment 17 Zac Medico gentoo-dev 2024-03-10 21:23:51 UTC
I'm thinking about adding a log of recently synced repo revisions that we can use as a database to ensure that the binhost's exported REPO_REVISIONS always progress forward and never backward.
Comment 18 Zac Medico gentoo-dev 2024-05-24 22:11:34 UTC
Another side of the coin is that you could receive binary packages which were build with a *newer* version of the source repository, and REPO_REVISIONS will not give us a way to reject those.

I notice that Amazon Linux 2023 has a mechanism to prevent you from installing new packages too early, via the dnf releasever:

https://docs.aws.amazon.com/linux/al2023/ug/deterministic-upgrades-usage.html
Comment 19 Zac Medico gentoo-dev 2024-05-25 03:10:58 UTC
(In reply to Zac Medico from comment #18)
> Another side of the coin is that you could receive binary packages which
> were build with a *newer* version of the source repository, and
> REPO_REVISIONS will not give us a way to reject those.
> 
> I notice that Amazon Linux 2023 has a mechanism to prevent you from
> installing new packages too early, via the dnf releasever:
> 
> https://docs.aws.amazon.com/linux/al2023/ug/deterministic-upgrades-usage.html

I've realized that REPO_REVISIONS will work fine as long as we keep our binhost indexes pinned during the course of a particular series of updates.
Comment 20 Zac Medico gentoo-dev 2024-05-25 16:35:57 UTC
For the purposes of bug 932739, we can introduce a binrepos.conf "freeze" or "pause" attribute that will have a backward compatible default in the [DEFAULT] section, and you'll be able to temporarily freeze the binrepo index caches for consistent and reproducible dependency calculations.
Comment 21 Larry the Git Cow gentoo-dev 2024-05-25 23:14:52 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=5aed7289d516fab5b63557da46348125eabab368

commit 5aed7289d516fab5b63557da46348125eabab368
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-03-14 04:09:34 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-05-25 22:08:15 +0000

    bintree: Add REPO_REVISIONS to package index header
    
    As a means for binhost clients to select source repo
    revisions which are consistent with binhosts, inject
    REPO_REVISIONS from a package into the index header,
    using a history of synced revisions to guarantee
    forward progress. This queries the relevant repos to
    check if any new revisions have appeared in the
    absence of a proper sync operation.
    
    Bug: https://bugs.gentoo.org/924772
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/dbapi/bintree.py              | 67 ++++++++++++++++++++++++++++-
 lib/portage/tests/sync/test_sync_local.py | 71 +++++++++++++++++++++++++------
 2 files changed, 124 insertions(+), 14 deletions(-)

https://gitweb.gentoo.org/proj/portage.git/commit/?id=71d9ce40be5bbf533a6d1b59c5a460621c3c91c4

commit 71d9ce40be5bbf533a6d1b59c5a460621c3c91c4
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2024-03-14 04:09:21 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2024-05-25 22:08:15 +0000

    Add get_repo_revision_history function and repo_revisions file
    
    The history of synced revisions is provided by a new
    get_repo_revision_history function and corresponding
    /var/lib/portage/repo_revisions file, with history
    limit currently capped at 25 revisions. If a change
    is detected and the current process has permission
    to update the repo_revisions file, then the file will
    be updated with any newly detected revisions.
    For volatile repos the revisions may be unordered,
    which makes them unusable for the purposes of the
    revision history, so the revisions of volatile repos
    are not tracked. This functions detects revisions
    which are not yet visible to the current process due
    to the sync-rcu option.
    
    The emaint revisions --purgerepos and --purgeallrepos
    options allow revisions for some or all repos to be
    easily purged from the history. For example, the
    emerge-webrsync script uses this emaint commmand to
    purge the revision history of the gentoo repo when
    the emerge-webrsync --revert option is used to roll
    back to a previous snapshot:
    
        emaint revisions --purgerepos="${repo_name}"
    
    Bug: https://bugs.gentoo.org/924772
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 bin/emerge-webrsync                               |   3 +-
 lib/portage/const.py                              |   1 +
 lib/portage/emaint/modules/meson.build            |   1 +
 lib/portage/emaint/modules/revisions/__init__.py  |  36 ++++++
 lib/portage/emaint/modules/revisions/meson.build  |   8 ++
 lib/portage/emaint/modules/revisions/revisions.py |  95 ++++++++++++++++
 lib/portage/sync/controller.py                    |   8 +-
 lib/portage/sync/meson.build                      |   1 +
 lib/portage/sync/revision_history.py              | 133 ++++++++++++++++++++++
 lib/portage/tests/sync/test_sync_local.py         |  75 +++++++++++-
 man/emaint.1                                      |  18 ++-
 man/portage.5                                     |  15 +++
 12 files changed, 387 insertions(+), 7 deletions(-)
Comment 22 Zac Medico gentoo-dev 2024-05-25 23:27:32 UTC
Once portage-3.0.65 has been stabilized with REPO_REVISIONS support, we'll start seeing it appear in the binhosts. At that point, we can manually checkout the corresponding commit for consistency, manually toggle the binhost to frozen, and think about how we can automate the process.
Comment 23 Zac Medico gentoo-dev 2024-07-14 01:07:01 UTC
I found this in the upstream binhost header today:

REPO_REVISIONS: {"gentoo": "f65df60d300c372f0b0f005a1f758b63a1c6806d"}
TIMESTAMP: 1720863505

However, this commit includes a dev-qt/qtbase-6.7.2 stabilization for which binary packages were not yet available (I've skipped these update in order to wait for the binhost to provide them):

> [ebuild     U  ] dev-python/apsw-3.46.0.1::gentoo [3.45.3.0::gentoo] USE="-debug -doc" PYTHON_TARGETS="python3_12 -python3_10 -python3_11 -python3_13" 892 KiB              
> [ebuild     U  ] dev-qt/qtbase-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="X concurrent cups dbus gtk gui icu libinput libproxy network nls opengl sql sqlite ssl udev vulkan wayland widgets xml (zstd) -accessibility -brotli -eglfs -evdev -gles2-only -gssapi -mysql -oci8 -odbc -postgres -renderdoc -sctp -test -tslib" 48,208 KiB         
> [ebuild     U  ] dev-qt/qtwayland-6.7.2-r1:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="qml vulkan -accessibility -compositor -test" 1,097 KiB                              
> [ebuild     U  ] dev-qt/qtsvg-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="-test" 1,750 KiB                                                                           
> [ebuild     U  ] dev-qt/qtshadertools-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="-test" 1,086 KiB                                                                   
> [ebuild     U  ] dev-qt/qtdeclarative-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="jit network opengl sql ssl svg vulkan widgets -accessibility -qmlls" 34,795 KiB    
> [ebuild     U  ] dev-qt/qttools-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="assistant linguist opengl qdbus qml vulkan widgets (zstd) -clang -designer -distancefieldgenerator -gles2-only -pixeltool -qdoc -qtattributionsscanner -qtdiag -qtplugininfo" LLVM_SLOT="17 -15 -16 (-18)" 8,809 KiB
> [ebuild     U  ] dev-qt/qttranslations-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] 1,512 KiB
> [ebuild     U  ] dev-qt/qtimageformats-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="mng -test" 1,929 KiB
Comment 24 Eli Schwartz gentoo-dev 2024-07-14 18:30:25 UTC
The qtbase issue is actually because the binhost builder has run already and yet it was unable to build qt due to solver failure. The partial log excerpt on the status mailing list indicates probably a USE flag that needs changing.

I don't know that any conclusions can be drawn from this w.r.t. portage development. It's basically a specialized case of "the binhost no longer offers packages for XXX".
Comment 25 Zac Medico gentoo-dev 2024-07-14 18:47:28 UTC
(In reply to Eli Schwartz from comment #24)
> The qtbase issue is actually because the binhost builder has run already and
> yet it was unable to build qt due to solver failure. The partial log excerpt
> on the status mailing list indicates probably a USE flag that needs changing.
> 
> I don't know that any conclusions can be drawn from this w.r.t. portage
> development. It's basically a specialized case of "the binhost no longer
> offers packages for XXX".

It's possible to cope with this kind of failure on the server side by preventing the binhost updates from being distributed before such solver failures have been resolved. Basically, treat the solver failure as a fatal QA issue that prevents and changes from flowing to mirrors.

Alternatively we can possibly cope on the client side by forcing --getbinpkgonly mode for packages we expect to come from the binhost, as noted in bug 463964 comment #9.
Comment 26 Zac Medico gentoo-dev 2024-07-29 16:34:31 UTC
Bug 936287 comment 5 describes a way that we could modify the Packages index update behavior to keep its REPO_REVISIONS in sync with the source repository using a pseudo "frozen" state like the one added in bug 924772.

In order to cope with missing package builds that could be expected to be available for the given REPO_REVISIONS as reported in comment 23, we could implement a conditional Packages index update that will occur only if REPO_REVISIONS remains unchanged in the updated version (obviously TIMESTAMP would change).