I'm currently exploring gentoo's recent expansion into providing a distro-level upstream binhost, using amd64 on older hardware, so not -v3 :-) A problem I'm finding is that when applications updates are available in portage there is (not surprisingly) a delay before new binaries are re-built and available for download from the binhost. This leads to the situation where portage offers source-based upgrades for applications where ordinarily pre-built binaries would be offered. This is clearly not a big deal for small packages, but the recent poppler/boost/libreoffice stabilisations would have meant ~10 hours of building on the hardware in question, on the one hand, and counteracts the reasons for using the binhost in the first place, on the other. So, the request is: can consideration be given to extending portage to allow for warning/blocking where application upgrades are available in portage but for which binaries are not yet available from the configured mirror(s)? (I am aware of the -g/-G options, obviously, but these don't seem to cover the use-case described?) Thanks (and hoping I haven't missed something obvious...) Reproducible: Always
This is kind of bug 463964 but maybe we should keep it separate with a depends-on.
(In reply to Sam James from comment #1) > This is kind of bug 463964 but maybe we should keep it separate with a > depends-on. Thanks, I hadn't previously seen that one
(In reply to Adrian Bassett from comment #2) > (In reply to Sam James from comment #1) > > This is kind of bug 463964 but maybe we should keep it separate with a > > depends-on. > > Thanks, I hadn't previously seen that one ... but would add that I was not thinking of granularity beyond use/delay using binhost pkg v. compile locally if package is either not supported on binhost or there is a USE flag discrepancy, although those two effectively amount to the same thing.
A possibly useful mitigation for this issue would be to create a sync configuration that only updates to the latest revision of the gentoo ebuild repository that the binhost has finished processing. For example, it could be implemented using a git branch that the binhost infrastructure is responsible for updating when it has finished processing a particular revision of the gentoo master branch. How does that sound @dilfridge?
If we have a gentoo git branch that the binhost infrastructure maintains to match the state of mirrored binhosts, then we can add something about how to configure git sync from the binhost's branch near the binrepos.conf instructions: https://wiki.gentoo.org/wiki/Gentoo_Binary_Host_Quickstart#binrepos.conf Ideally, the git branch should be published at about the same time as the binhost updates are scheduled to arrive on mirrors. It's best if binhost updates are as atomic as possible, in order to minimize user exposure to inconsistent states that could trigger dependency conflicts.
Actually, we can implement the binhost branch on the client side if the binhost Packages index file contains a header for git commit hash from the gentoo repo.
The gentoo git commit hash in the Packages header might be implemented in portage as a sort of intentional information leak (like the other information it leaks as reported in bug 912648).
An advantage of having a public git sync branch for this is that users can use sync-depth = 1 and it will fetch the correct revision. If we use a Packages header containing a git commit hash to implement the consistency on the client side, then a larger sync-depth will be required. An advantage of implementing the consistency on the client side is that it removes the burden of synchronizing the public git sync branch update with the mirroring of the corresponding binhost updates.
Another advantage of implementing the consistency on the client side is that we are practically guaranteed to find the commit hash referenced by binhost Packages file, without ever needing to retry sync of either the binhost repo or ebuild repo. If we use a public git sync branch for binhost users, there's a race to achieve a consistent state, so in theory we might need to retry if inconsistent state is detected. However, we should be able to sync the binhost repo just once, and then localize any retry in the ebuild repo git sync, and it should never have to retry more than once unless something has gone wrong and prevented updates to the public git sync branch for binhost users.
(In reply to Zac Medico from comment #7) > The gentoo git commit hash in the Packages header might be implemented in > portage as a sort of intentional information leak (like the other > information it leaks as reported in bug 912648). I suppose we could represent this as a json object that maps repo name to commit hash, and we can limit the repos it exposes to those for which packages exist in the Packages file.
This binhost infrastructure does not necessarily need to use git sync in order for us to get the corresponding git commit, since we parse metadata/timestamp.commit for rsync sync: https://gitweb.gentoo.org/proj/portage.git/commit/?id=0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5 commit 0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5 Author: Manuel Rüger <mrueg@gentoo.org> Date: 2017-06-16 16:48:34 +0200 emerge: Add head commit per repo to --info This adds the following to emerge --info output for git and rsync based repositories: Head commit of repository gentoo: 0518b330edac963f54f98df33391b8e7b9eaee4c Reviewed-By: Zac Medico <zmedico@gentoo.org>
I suppose we could sample the source repository git commit at the time that EbuildBinpkg injects it into the binarytree here: https://gitweb.gentoo.org/proj/portage.git/commit/?id=89df7574a355a245e19ba297c3685997eec6bbbe However, the git commit would then be incorrect if the repository was synced after the build started, so it's better if we make EbuildBuild record the git commit hash in the ${PORTAGE_BUILDDIR}/build-info directory where it also keeps a copy of the ebuild. I suppose we should also include commit hashes for any parent repositories that eclasses were inherited from.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=eea598a20b2db5ecbe3975dc96885f529ae54c1c commit eea598a20b2db5ecbe3975dc96885f529ae54c1c Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-03-09 21:22:35 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-03-09 21:22:35 +0000 __dyn_install: Record REPO_REVISIONS in build-info Record REPO_REVISIONS as a json object that maps repo name to revision for an ebuild's source repository and any repositories that eclasses were inherited from: $ cat /var/tmp/portage/sys-apps/portage-3.0.63/build-info/REPO_REVISIONS {"gentoo": "34875e30e73e33d3597d1101cdf97dc22729b268"} Ultimately the intention is to expose this information in binhost metadata so that clients can select consistent revisions of source repositories. Bug: https://bugs.gentoo.org/924772 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/phase-functions.sh | 1 + lib/_emerge/EbuildPhase.py | 46 ++++++++++++++++++++++ .../package/ebuild/_config/special_env_vars.py | 1 + 3 files changed, 48 insertions(+)
I'm thinking about how to merge REPO_REVISIONS values from individual packages into a global REPO_REVISIONS value for the packages index. When we do this, we need to ensure that newer revisions are not replaced with older revisions, for example if a package built against and older sync finishes building after other packages built against a newer sync have already merged their REPO_REVISIONS into the global REPO_REVISIONS value. One way to do this is to only merge revisions into the global REPO_REVISIONS value if they correspond to the currently sync repository state, which will serve to filter out older values.
(In reply to Zac Medico from comment #14) > One way to do this is to only merge revisions into the global REPO_REVISIONS > value if they correspond to the currently sync repository state, which will > serve to filter out older values. The rsync sync-rcu option makes this a little tricky because running processes hold references to older snapshots. We can detect this case by checking if the repo location and user_location still refer to the same path.
(In reply to Zac Medico from comment #15) FWIW, I suspect this option isn't very popular at the moment (which is a shame, as it's great). Not that we should ignore it, ofc.
I'm thinking about adding a log of recently synced repo revisions that we can use as a database to ensure that the binhost's exported REPO_REVISIONS always progress forward and never backward.
Another side of the coin is that you could receive binary packages which were build with a *newer* version of the source repository, and REPO_REVISIONS will not give us a way to reject those. I notice that Amazon Linux 2023 has a mechanism to prevent you from installing new packages too early, via the dnf releasever: https://docs.aws.amazon.com/linux/al2023/ug/deterministic-upgrades-usage.html
(In reply to Zac Medico from comment #18) > Another side of the coin is that you could receive binary packages which > were build with a *newer* version of the source repository, and > REPO_REVISIONS will not give us a way to reject those. > > I notice that Amazon Linux 2023 has a mechanism to prevent you from > installing new packages too early, via the dnf releasever: > > https://docs.aws.amazon.com/linux/al2023/ug/deterministic-upgrades-usage.html I've realized that REPO_REVISIONS will work fine as long as we keep our binhost indexes pinned during the course of a particular series of updates.
For the purposes of bug 932739, we can introduce a binrepos.conf "freeze" or "pause" attribute that will have a backward compatible default in the [DEFAULT] section, and you'll be able to temporarily freeze the binrepo index caches for consistent and reproducible dependency calculations.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=5aed7289d516fab5b63557da46348125eabab368 commit 5aed7289d516fab5b63557da46348125eabab368 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-03-14 04:09:34 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-05-25 22:08:15 +0000 bintree: Add REPO_REVISIONS to package index header As a means for binhost clients to select source repo revisions which are consistent with binhosts, inject REPO_REVISIONS from a package into the index header, using a history of synced revisions to guarantee forward progress. This queries the relevant repos to check if any new revisions have appeared in the absence of a proper sync operation. Bug: https://bugs.gentoo.org/924772 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/dbapi/bintree.py | 67 ++++++++++++++++++++++++++++- lib/portage/tests/sync/test_sync_local.py | 71 +++++++++++++++++++++++++------ 2 files changed, 124 insertions(+), 14 deletions(-) https://gitweb.gentoo.org/proj/portage.git/commit/?id=71d9ce40be5bbf533a6d1b59c5a460621c3c91c4 commit 71d9ce40be5bbf533a6d1b59c5a460621c3c91c4 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-03-14 04:09:21 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-05-25 22:08:15 +0000 Add get_repo_revision_history function and repo_revisions file The history of synced revisions is provided by a new get_repo_revision_history function and corresponding /var/lib/portage/repo_revisions file, with history limit currently capped at 25 revisions. If a change is detected and the current process has permission to update the repo_revisions file, then the file will be updated with any newly detected revisions. For volatile repos the revisions may be unordered, which makes them unusable for the purposes of the revision history, so the revisions of volatile repos are not tracked. This functions detects revisions which are not yet visible to the current process due to the sync-rcu option. The emaint revisions --purgerepos and --purgeallrepos options allow revisions for some or all repos to be easily purged from the history. For example, the emerge-webrsync script uses this emaint commmand to purge the revision history of the gentoo repo when the emerge-webrsync --revert option is used to roll back to a previous snapshot: emaint revisions --purgerepos="${repo_name}" Bug: https://bugs.gentoo.org/924772 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/emerge-webrsync | 3 +- lib/portage/const.py | 1 + lib/portage/emaint/modules/meson.build | 1 + lib/portage/emaint/modules/revisions/__init__.py | 36 ++++++ lib/portage/emaint/modules/revisions/meson.build | 8 ++ lib/portage/emaint/modules/revisions/revisions.py | 95 ++++++++++++++++ lib/portage/sync/controller.py | 8 +- lib/portage/sync/meson.build | 1 + lib/portage/sync/revision_history.py | 133 ++++++++++++++++++++++ lib/portage/tests/sync/test_sync_local.py | 75 +++++++++++- man/emaint.1 | 18 ++- man/portage.5 | 15 +++ 12 files changed, 387 insertions(+), 7 deletions(-)
Once portage-3.0.65 has been stabilized with REPO_REVISIONS support, we'll start seeing it appear in the binhosts. At that point, we can manually checkout the corresponding commit for consistency, manually toggle the binhost to frozen, and think about how we can automate the process.
I found this in the upstream binhost header today: REPO_REVISIONS: {"gentoo": "f65df60d300c372f0b0f005a1f758b63a1c6806d"} TIMESTAMP: 1720863505 However, this commit includes a dev-qt/qtbase-6.7.2 stabilization for which binary packages were not yet available (I've skipped these update in order to wait for the binhost to provide them): > [ebuild U ] dev-python/apsw-3.46.0.1::gentoo [3.45.3.0::gentoo] USE="-debug -doc" PYTHON_TARGETS="python3_12 -python3_10 -python3_11 -python3_13" 892 KiB > [ebuild U ] dev-qt/qtbase-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="X concurrent cups dbus gtk gui icu libinput libproxy network nls opengl sql sqlite ssl udev vulkan wayland widgets xml (zstd) -accessibility -brotli -eglfs -evdev -gles2-only -gssapi -mysql -oci8 -odbc -postgres -renderdoc -sctp -test -tslib" 48,208 KiB > [ebuild U ] dev-qt/qtwayland-6.7.2-r1:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="qml vulkan -accessibility -compositor -test" 1,097 KiB > [ebuild U ] dev-qt/qtsvg-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="-test" 1,750 KiB > [ebuild U ] dev-qt/qtshadertools-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="-test" 1,086 KiB > [ebuild U ] dev-qt/qtdeclarative-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="jit network opengl sql ssl svg vulkan widgets -accessibility -qmlls" 34,795 KiB > [ebuild U ] dev-qt/qttools-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="assistant linguist opengl qdbus qml vulkan widgets (zstd) -clang -designer -distancefieldgenerator -gles2-only -pixeltool -qdoc -qtattributionsscanner -qtdiag -qtplugininfo" LLVM_SLOT="17 -15 -16 (-18)" 8,809 KiB > [ebuild U ] dev-qt/qttranslations-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] 1,512 KiB > [ebuild U ] dev-qt/qtimageformats-6.7.2:6/6.7.2::gentoo [6.7.1:6/6.7.1::gentoo] USE="mng -test" 1,929 KiB
The qtbase issue is actually because the binhost builder has run already and yet it was unable to build qt due to solver failure. The partial log excerpt on the status mailing list indicates probably a USE flag that needs changing. I don't know that any conclusions can be drawn from this w.r.t. portage development. It's basically a specialized case of "the binhost no longer offers packages for XXX".
(In reply to Eli Schwartz from comment #24) > The qtbase issue is actually because the binhost builder has run already and > yet it was unable to build qt due to solver failure. The partial log excerpt > on the status mailing list indicates probably a USE flag that needs changing. > > I don't know that any conclusions can be drawn from this w.r.t. portage > development. It's basically a specialized case of "the binhost no longer > offers packages for XXX". It's possible to cope with this kind of failure on the server side by preventing the binhost updates from being distributed before such solver failures have been resolved. Basically, treat the solver failure as a fatal QA issue that prevents and changes from flowing to mirrors. Alternatively we can possibly cope on the client side by forcing --getbinpkgonly mode for packages we expect to come from the binhost, as noted in bug 463964 comment #9.
Bug 936287 comment 5 describes a way that we could modify the Packages index update behavior to keep its REPO_REVISIONS in sync with the source repository using a pseudo "frozen" state like the one added in bug 924772. In order to cope with missing package builds that could be expected to be available for the given REPO_REVISIONS as reported in comment 23, we could implement a conditional Packages index update that will occur only if REPO_REVISIONS remains unchanged in the updated version (obviously TIMESTAMP would change).