I'm currently exploring gentoo's recent expansion into providing a distro-level upstream binhost, using amd64 on older hardware, so not -v3 :-) A problem I'm finding is that when applications updates are available in portage there is (not surprisingly) a delay before new binaries are re-built and available for download from the binhost. This leads to the situation where portage offers source-based upgrades for applications where ordinarily pre-built binaries would be offered. This is clearly not a big deal for small packages, but the recent poppler/boost/libreoffice stabilisations would have meant ~10 hours of building on the hardware in question, on the one hand, and counteracts the reasons for using the binhost in the first place, on the other. So, the request is: can consideration be given to extending portage to allow for warning/blocking where application upgrades are available in portage but for which binaries are not yet available from the configured mirror(s)? (I am aware of the -g/-G options, obviously, but these don't seem to cover the use-case described?) Thanks (and hoping I haven't missed something obvious...) Reproducible: Always
This is kind of bug 463964 but maybe we should keep it separate with a depends-on.
(In reply to Sam James from comment #1) > This is kind of bug 463964 but maybe we should keep it separate with a > depends-on. Thanks, I hadn't previously seen that one
(In reply to Adrian Bassett from comment #2) > (In reply to Sam James from comment #1) > > This is kind of bug 463964 but maybe we should keep it separate with a > > depends-on. > > Thanks, I hadn't previously seen that one ... but would add that I was not thinking of granularity beyond use/delay using binhost pkg v. compile locally if package is either not supported on binhost or there is a USE flag discrepancy, although those two effectively amount to the same thing.
A possibly useful mitigation for this issue would be to create a sync configuration that only updates to the latest revision of the gentoo ebuild repository that the binhost has finished processing. For example, it could be implemented using a git branch that the binhost infrastructure is responsible for updating when it has finished processing a particular revision of the gentoo master branch. How does that sound @dilfridge?
If we have a gentoo git branch that the binhost infrastructure maintains to match the state of mirrored binhosts, then we can add something about how to configure git sync from the binhost's branch near the binrepos.conf instructions: https://wiki.gentoo.org/wiki/Gentoo_Binary_Host_Quickstart#binrepos.conf Ideally, the git branch should be published at about the same time as the binhost updates are scheduled to arrive on mirrors. It's best if binhost updates are as atomic as possible, in order to minimize user exposure to inconsistent states that could trigger dependency conflicts.
Actually, we can implement the binhost branch on the client side if the binhost Packages index file contains a header for git commit hash from the gentoo repo.
The gentoo git commit hash in the Packages header might be implemented in portage as a sort of intentional information leak (like the other information it leaks as reported in bug 912648).
An advantage of having a public git sync branch for this is that users can use sync-depth = 1 and it will fetch the correct revision. If we use a Packages header containing a git commit hash to implement the consistency on the client side, then a larger sync-depth will be required. An advantage of implementing the consistency on the client side is that it removes the burden of synchronizing the public git sync branch update with the mirroring of the corresponding binhost updates.
Another advantage of implementing the consistency on the client side is that we are practically guaranteed to find the commit hash referenced by binhost Packages file, without ever needing to retry sync of either the binhost repo or ebuild repo. If we use a public git sync branch for binhost users, there's a race to achieve a consistent state, so in theory we might need to retry if inconsistent state is detected. However, we should be able to sync the binhost repo just once, and then localize any retry in the ebuild repo git sync, and it should never have to retry more than once unless something has gone wrong and prevented updates to the public git sync branch for binhost users.
(In reply to Zac Medico from comment #7) > The gentoo git commit hash in the Packages header might be implemented in > portage as a sort of intentional information leak (like the other > information it leaks as reported in bug 912648). I suppose we could represent this as a json object that maps repo name to commit hash, and we can limit the repos it exposes to those for which packages exist in the Packages file.
This binhost infrastructure does not necessarily need to use git sync in order for us to get the corresponding git commit, since we parse metadata/timestamp.commit for rsync sync: https://gitweb.gentoo.org/proj/portage.git/commit/?id=0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5 commit 0e1699ad6b3f8eec56fbd6dd6255ed1145e89dd5 Author: Manuel Rüger <mrueg@gentoo.org> Date: 2017-06-16 16:48:34 +0200 emerge: Add head commit per repo to --info This adds the following to emerge --info output for git and rsync based repositories: Head commit of repository gentoo: 0518b330edac963f54f98df33391b8e7b9eaee4c Reviewed-By: Zac Medico <zmedico@gentoo.org>
I suppose we could sample the source repository git commit at the time that EbuildBinpkg injects it into the binarytree here: https://gitweb.gentoo.org/proj/portage.git/commit/?id=89df7574a355a245e19ba297c3685997eec6bbbe However, the git commit would then be incorrect if the repository was synced after the build started, so it's better if we make EbuildBuild record the git commit hash in the ${PORTAGE_BUILDDIR}/build-info directory where it also keeps a copy of the ebuild. I suppose we should also include commit hashes for any parent repositories that eclasses were inherited from.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=eea598a20b2db5ecbe3975dc96885f529ae54c1c commit eea598a20b2db5ecbe3975dc96885f529ae54c1c Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2024-03-09 21:22:35 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2024-03-09 21:22:35 +0000 __dyn_install: Record REPO_REVISIONS in build-info Record REPO_REVISIONS as a json object that maps repo name to revision for an ebuild's source repository and any repositories that eclasses were inherited from: $ cat /var/tmp/portage/sys-apps/portage-3.0.63/build-info/REPO_REVISIONS {"gentoo": "34875e30e73e33d3597d1101cdf97dc22729b268"} Ultimately the intention is to expose this information in binhost metadata so that clients can select consistent revisions of source repositories. Bug: https://bugs.gentoo.org/924772 Signed-off-by: Zac Medico <zmedico@gentoo.org> bin/phase-functions.sh | 1 + lib/_emerge/EbuildPhase.py | 46 ++++++++++++++++++++++ .../package/ebuild/_config/special_env_vars.py | 1 + 3 files changed, 48 insertions(+)
I'm thinking about how to merge REPO_REVISIONS values from individual packages into a global REPO_REVISIONS value for the packages index. When we do this, we need to ensure that newer revisions are not replaced with older revisions, for example if a package built against and older sync finishes building after other packages built against a newer sync have already merged their REPO_REVISIONS into the global REPO_REVISIONS value. One way to do this is to only merge revisions into the global REPO_REVISIONS value if they correspond to the currently sync repository state, which will serve to filter out older values.
(In reply to Zac Medico from comment #14) > One way to do this is to only merge revisions into the global REPO_REVISIONS > value if they correspond to the currently sync repository state, which will > serve to filter out older values. The rsync sync-rcu option makes this a little tricky because running processes hold references to older snapshots. We can detect this case by checking if the repo location and user_location still refer to the same path.
(In reply to Zac Medico from comment #15) FWIW, I suspect this option isn't very popular at the moment (which is a shame, as it's great). Not that we should ignore it, ofc.
I'm thinking about adding a log of recently synced repo revisions that we can use as a database to ensure that the binhost's exported REPO_REVISIONS always progress forward and never backward.