Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 937906 - snapshots/squashfs/sha512sum.txt not containing latest gentoo-current.xz.sqfs
Summary: snapshots/squashfs/sha512sum.txt not containing latest gentoo-current.xz.sqfs
Status: CONFIRMED
Alias: None
Product: Gentoo Infrastructure
Classification: Unclassified
Component: Other (show other bugs)
Hardware: All All
: Normal normal (vote)
Assignee: Gentoo Infrastructure
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-08-14 07:01 UTC by Christian Nilsson
Modified: 2024-08-14 19:25 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Nilsson 2024-08-14 07:01:22 UTC
curl -L -C - --remote-name-all http://distfiles.gentoo.org/snapshots/squashfs/gentoo-current.xz.sqfs http://distfiles.gentoo.org/snapshots/squashfs/sha512sum.txt; sha512sum -c sha512sum.txt

This has only succeeded on very rare occasions.

currently gentoo-current.xz.sqfs is the same as gentoo-20240812.xz.sqfs
however most of the time gentoo-current.xz.sqfs sha512 does not exist in sha512sum.txt, but 1-2 days later that hash is there

Original analysis at https://github.com/ASoft-se/Gentoo-HAI/issues/72
Similar mentions at https://forums.gentoo.org/viewtopic-p-8582013.html#8582013

I have not been able to identify original source of these files to rule out any sync issues between mirrors.

Where is these files generated, maybe there is a good explanation if looking at the script?

Wanted resolution:
1. have sha512sum.txt match content at mirror
2. document the inconsistency and why it happens
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2024-08-14 17:23:12 UTC
The explanation is simple:
distfiles.gentoo.org is a CDN - and objects may be evicted from the cache at different rates - leading to the checksum not matching.

From the infrastructure perspective, I would like is to STOP distributing any binary artifacts that change, e.g. "gentoo-current.xz.sqfs"

This would however break "easy" scripts that want a consistent URL direct to an artifact - they won't want to parse a textfile to identify the correct artifact to download.

The issue DOES also exist, but is harder to catch in the underlying mirrors:

gentoo.osuosl.org is the public origin used for the CDN - but the underlying private origin is the only place guaranteed to be consistent. The rsync from the private origin to the public origin still has brief periods where it's also inconsistent, because we don't have significant control that public origin.
Comment 2 Christian Nilsson 2024-08-14 19:25:24 UTC
I wasn't aware it was full CDN these days, but checking the headers now it's clear.
And seeing that, it all makes sense, thanks!

In regards to the forum post, I would expect rsync source be more in sync, but maybe in that part my testing might be lacking. (commands at the end in this comment works fine)
Having the same filename does simplify rsync updates, but I fully understand, and agree, with the stance of not having binary changes, and it is possible to handle this quite easily by grabbing the list, greping and sorting. 

I'm happy to have the answer as documentation, and resolve as "works as intended"
For the sake of documentation tho, could we link to any resource showing script that does the update, I assume it is in some infra git repo, but not sure where to look?

rsync -v --copy-links rsync://gentoo.osuosl.org/gentoo/snapshots/squashfs/gentoo-current.xz.sqfs rsync://gentoo.osuosl.org/gentoo/snapshots/squashfs/sha512sum.txt .; sha512sum -c sha512sum.txt