# the current set went live on 2017-11-21, per 2017-11-12 Council meeting # https://archives.gentoo.org/gentoo-dev/message/ba2e5d9666ebd7e1bff1143485a37856 manifest-hashes = BLAKE2B SHA512 3.5 yr later, it might be reasonable to finally remove SHA512. I don't have any strong arguments for or against it, just seeing it as a cleanup move.
I strongly object here: The idea of listening multiple hashes will protect us against the event that there will be a problem with one hash type so we should always keep at least two competing hashes.
Could we phase out SHA512 and add KECCAK instead, or is it too slow?
(In reply to Ulrich Müller from comment #2) > Could we phase out SHA512 and add KECCAK instead, or is it too slow? Very primitive benchmark (on i7-8550U @ 1.80GHz): $ time for (( i=0; i<100; i++ )); do b2sum /var/cache/distfiles/emacs-27.2.tar.xz >/dev/null; done real 0m5.879s user 0m5.203s sys 0m0.674s $ time for (( i=0; i<100; i++ )); do sha512sum /var/cache/distfiles/emacs-27.2.tar.xz >/dev/null; done real 0m10.894s user 0m10.170s sys 0m0.723s $ time for (( i=0; i<100; i++ )); do openssl dgst -blake2b512 /var/cache/distfiles/emacs-27.2.tar.xz >/dev/null; done real 0m6.217s user 0m5.450s sys 0m0.766s $ time for (( i=0; i<100; i++ )); do openssl dgst -sha512 /var/cache/distfiles/emacs-27.2.tar.xz >/dev/null; done real 0m7.761s user 0m6.933s sys 0m0.828s $ time for (( i=0; i<100; i++ )); do openssl dgst -sha3-512 /var/cache/distfiles/emacs-27.2.tar.xz >/dev/null; done real 0m24.336s user 0m23.491s sys 0m0.846s Taking the numbers from openssl, sha3 (keccak) is slower by a factor of 3 and 4 compared to sha2 and blake2, respectively. Interestingly, openssl is some 40% faster than coreutils for sha2.
(In reply to Thomas Deutschmann from comment #1) > I strongly object here: The idea of listening multiple hashes will protect > us against the event that there will be a problem with one hash type so we > should always keep at least two competing hashes. IIRC the last discussion ended up with the conclusion that there's no real value from having multiple hashes, and using them like this in the past was pretty much cargo cult. I think we're better than that.
(In reply to Ulrich Müller from comment #3) > Taking the numbers from openssl, sha3 (keccak) is slower by a factor of 3 > and 4 compared to sha2 and blake2, respectively. > > Interestingly, openssl is some 40% faster than coreutils for sha2. I think what matters in practice is the Python implementation that Portage/pkgcore uses. gemato includes a primitive benchmarking tool. On Ryzen 5 3600, on tmpfs: $ /home/mgorny/git/gemato/utils/benchmark.py chromium-91.0.4472.19.tar.xz blake2b sha512 sha3_512 ['blake2b'] -> [1.4790698910001083, 1.4788789770009316, 1.4786044790016604, 1.4795997879991774, 1.480962300000101] ['sha512'] -> [1.6823269540000183, 1.683155593000265, 1.6835432360003324, 1.6823211499995523, 1.6972763199992187] ['sha3_512'] -> [5.392342687000564, 5.385123105001185, 5.393110663000698, 5.4156869710004685, 5.429470488999868] That said, I think Python's blake2b impl is probably suffering from some performance problems (again?) ;-/.
(In reply to Michał Górny from comment #5) > (In reply to Ulrich Müller from comment #3) > > Taking the numbers from openssl, sha3 (keccak) is slower by a factor of 3 > > and 4 compared to sha2 and blake2, respectively. > > > > Interestingly, openssl is some 40% faster than coreutils for sha2. > > I think what matters in practice is the Python implementation that > Portage/pkgcore uses. gemato includes a primitive benchmarking tool. > > On Ryzen 5 3600, on tmpfs: > > $ /home/mgorny/git/gemato/utils/benchmark.py chromium-91.0.4472.19.tar.xz > blake2b sha512 sha3_512 > ['blake2b'] -> [1.4790698910001083, 1.4788789770009316, 1.4786044790016604, > 1.4795997879991774, 1.480962300000101] > ['sha512'] -> [1.6823269540000183, 1.683155593000265, 1.6835432360003324, > 1.6823211499995523, 1.6972763199992187] > ['sha3_512'] -> [5.392342687000564, 5.385123105001185, 5.393110663000698, > 5.4156869710004685, 5.429470488999868] That confirms (roughly) the factors from comment #3. sha3 is very slow. > That said, I think Python's blake2b impl is probably suffering from some > performance problems (again?) ;-/. Needs to be rewritten in Rust, I guess. :p
According to [1], BLAKE2b should be around 30% faster than SHA2-512. Ofc, it's possible that SHA2 became faster since then, or maybe sha_ni CPU extensions are helping here. Still, I'll take a few minutes to investigate if Python's choosing the most optimal variant for me. [1] https://www.blake2.net/
(In reply to Michał Górny from comment #7) > [...] or maybe sha_ni CPU extensions are helping here. No sha_ni here (Kaby Lake processor), but I can confirm that it's less than 30% difference: $ utils/benchmark.py /var/cache/distfiles/emacs-27.2.tar.xz blake2b sha512 sha3_512 ['blake2b'] -> [0.05584478902164847, 0.05420189001597464, 0.054973421967588365, 0.05634020798606798, 0.05651942198164761] ['sha512'] -> [0.0730064709787257, 0.07224606099771336, 0.07204092695610598, 0.07199707702966407, 0.07207291800295934] ['sha3_512'] -> [0.2509028369677253, 0.2516804229817353, 0.2471757759922184, 0.2528015899588354, 0.25678251701174304]
Curious enough, I get expected results when system is under load (niced): $ /home/mgorny/git/gemato/utils/benchmark.py /tmp/dist/chromium-91.0.4472.19.tar.xz blake2b sha512 ['blake2b'] -> [ 1.679 1.652 1.688 1.703 1.656 ] -> min: 1.652 ['sha512'] -> [ 2.687 2.748 2.741 2.78 2.67 ] -> min: 2.67 Without load: $ /home/mgorny/git/gemato/utils/benchmark.py /tmp/dist/chromium-91.0.4472.19.tar.xz blake2b sha512 ['blake2b'] -> [ 1.475 1.475 1.482 1.478 1.475 ] -> min: 1.475 ['sha512'] -> [ 1.677 1.677 1.676 1.676 1.676 ] -> min: 1.676
Ok, there doesn't seem to be anything wrong inside Python. I've tested all the variants and: - AVX and SSE4.1 variants are the fastest (for some reason SSE4.1 seems to be a tiny bit faster than AVX but that might be measurement error) - SSSE3 is slightly slower - reference (C) implementation is slightly slower than SSSE3 - SSE2 is awfully slow (as expected, Python is disabling it entirely)
The discussion to date seems to revolve around performance, not functionality. Something that was already partially lost in dropping earlier digests was the ability to trivially compare our digests vs upstream digests: dropping SHA512 will further limit that, because extremely few upstreams ship BLAKE2 signatures today, let alone SHA-3/Keccak. Even the latest OpenSSL announcements are only SHA1 + SHA256 https://mta.openssl.org/pipermail/openssl-announce/2021-April/000200.html Similarly OpenSSH (with the interesting tweak that they use base64 SHA256 rather than hex). What external functionality are we likely to lose by removing the SHA512 hash? I'm not so worried about the Manifest sizes in this, and more that being able to quickly ascertain that the hash in Gentoo is the same hash announced by upstreams.
Closing per negative ml feedback.