Summary: | github.com on-the-fly automatic artifacts / archives not guaranteed to be deterministic, shouldn't be used | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | cJ <cJ-gentoo> |
Component: | Current packages | Assignee: | Gentoo Quality Assurance Team <qa> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | floppym, gentoo, ionen, kfm, mgorny, sam |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | Linux | ||
See Also: |
https://bugs.gentoo.org/show_bug.cgi?id=881193 https://bugs.gentoo.org/show_bug.cgi?id=881249 https://bugs.gentoo.org/show_bug.cgi?id=881251 https://bugs.gentoo.org/show_bug.cgi?id=881253 https://bugs.gentoo.org/show_bug.cgi?id=881255 https://bugs.gentoo.org/show_bug.cgi?id=881257 https://bugs.gentoo.org/show_bug.cgi?id=881261 https://bugs.gentoo.org/show_bug.cgi?id=881263 |
||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | 881055, 881053 | ||
Bug Blocks: |
Description
cJ
2022-11-11 22:43:48 UTC
Even if not guaranteed, this is hardly ever been an issue and (as you say) we do have mirrors that are copying and keeping that frozen-in-time copy (little sense in replicating what mirrors do by mirroring it manually first). And we already prioritize proper release tarballs when they exist (unless they have a problem, like missing files we need). We don't have control over what overlays do, so I'm unsure what you want us to do here? (In reply to Ionen Wolkens from comment #1) > And we already prioritize proper release tarballs when they exist > (unless they have a problem, like missing files we need). On that note, feel free to fill bugs if an ebuild should use it but isn't. ionen yeah this is rather low priority for ::gentoo due to mirrors, but as you say, the existence of existing stable artifacts giving reproducible downloads could be scanned (sam mentioned https://github.com/pkgcore/pkgcheck/issues/473). Here's my dirty script that re-computes manifests and overnight (partial) results on ::gentoo: https://gist.github.com/zougloub/7fdea04c66e856fcac1000c398d795e1 At least with this, overlays can be scanned. Manually filed a case where an alternate download source could be used. This could be automated for others. Work-in-progress pertaining to this issue here: https://gitlab.com/cJ/gentoo-bug-881037-github-reproducible-downloads Some of the checks could probably land into pkgcheck. Filed an issue with github, because it would be so much more elegant if they were the ones to fix the problem. I discussed this a bit at https://lists.reproducible-builds.org/pipermail/rb-general/2021-October/002422.html The tl;dr is that this is not actually a real issue to my reasonably-certain knowledge, although I'd be interested in seeing credible proof consisting of before-and-after tarballs. Github auto-generated tarballs are not "guaranteed" by github, because they are the result of running the git-archive program which github doesn't personally guarantee. Luckily, it doesn't matter because that's all on the git project. ... As far as I can tell, the discussion here is basically about theoreticals? Lots of links to issues from 5+ years ago. I'm only aware of a couple realistic sources of non-reproducible behavior, assuming you don't use a truly ancient version of git to generate them. - unreproducible gzip (busybox gzip was "recently" fixed to be reproducible) - renaming the github repository such as to capitalize or lowercase the repo name, as that is embedded in the base filepath - obviously, re-tagging - gitattributes export-subst can embed information from the git repository, and depending on the information and how you define it, that can be non-reproducible (for example, abbreviated commit hashes can grow longer as the repo grows, some methods of embedding the author/committer will respect a mailmap file) Case 1 is solved, cases 2 and 3 are actually legitimate cases of upstream modification, and case 4 is a bug in upstream's export-subst handling. Are there other real issues which aren't about a commit from git.git dating back to 2013? ... Granted, if upstream themselves *provide* hand-generated dist tarballs this is superior, for several reasons that mostly don't have to do with reproducibility -- they can use better-than-gzip compression, they can have generated files included, they can have non-useful files *excluded* -- but if upstream doesn't provide them and maybe doesn't see much point because they don't use autotools, why is that a distro problem? I'm working on some changes there: https://github.com/gentoo/gentoo/pull/28247 where there's a bunch of minor improvements to supply chain security (feel free to review). I'll soon (as soon as I'm done re-fetching everything) include a commit with updated checksums corresponding to the new auto-generated archives that have changed, which also shouldn't hurt. I think saying github doesn't control git doesn't consider that they could have used another solution than git-archive in order to serve the archives (and maybe they do), or that they could contribute to an open source project (git, libgit2, whatever) to make their export more deterministic. Anyway, I have also filed a request with github. To clearly answer your question Eli, yes I found occurrences of changed archives in > 2020... will report soon a convincing list because there are a bunch in ::gentoo. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=80cc6b358ee5d7fe7a791dcd80c9297fe6a42fc9 commit 80cc6b358ee5d7fe7a791dcd80c9297fe6a42fc9 Author: Jérôme Carretero <cJ@zougloub.eu> AuthorDate: 2022-11-12 20:31:00 +0000 Commit: Michał Górny <mgorny@gentoo.org> CommitDate: 2022-11-14 03:41:03 +0000 dev-python/pyproject-metadata: canonicalize SRC_URI Signed-off-by: Jérôme Carretero <cJ-gentoo@zougloub.eu> Bug: https://bugs.gentoo.org/881037 Signed-off-by: Michał Górny <mgorny@gentoo.org> dev-python/pyproject-metadata/pyproject-metadata-0.5.0.ebuild | 2 +- dev-python/pyproject-metadata/pyproject-metadata-0.6.1.ebuild | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Giving this to QA; I can't see any other team in Gentoo taking action on this. Honestly, I don't recall the last time I've seen checksum mismatch due to GitHub archives being unstable. All that I've seen is checksum mismatches due to upstream retagging, and that's what we want to catch. |