It is great that portage-2.3.12 now correctly handles zstd for tbz2 archives. Current versions of zstd also support the long-range option --long=31 (e.g.) which especially for large packages like the linux kernel can remarkably decrease the archive size, essentially at the cost of (in this example) 2GB memory requirement for unpacking. Unfortunately, it is currently not possible to put this option into BINPKG_COMPRESS_FLAGS, because then an option must be specifying during unpacking which allows for the allocation of more memory. Therefore, it would be nice, if portage would support e.g. a BINPKG_UNCOMPRESS_FLAGS_ZSTD variable which is passed to unzstd for decompression. Perhaps also other variables like BINPKG_UNCOMPRESS_FLAGS_XZ would make sense, because unxz has a similar issue concerning memory limits.
Created attachment 542086 [details, diff] Support BINPKG_COMPRESS_FLAGS="--long=..." with zstd With the attached patch, zstd will decompress tbz2 files which were compressed with e.g. BINPKG_COMPRESS="zstd" BINPKG_COMPRESS_FLAGS="--long=29" The passed option --memory=4294967295 might look agressive, but I tested: If this option is used, it is *not* required that the option has the required memory. In fact, the amount of memory allocated by zstd with that option is exactly the necessary amount to decompress the given file. In other words, the only effect of this option is that zstd does not *refuse* to decompress a file whose decompression which requires more memory than the default.
s/that the option has the required memory/that the machine has the mentioned amount of memory/
(In reply to Martin Väth from comment #1) > The passed option --memory=4294967295 might look agressive, but I tested: Where did you get this number? It would be nicer if we could simply tell it to use as much memory as necessary, rather than a specific number.
> simply tell it to use as much memory as necessary That's exactly what zstd -d does: It only allocates the window size (which is stored in the file) needed to decompress the file, independent of the passed --memory option. The purpose of --memory is only to serve as a sanity check: zstd dies if the window size stored in the file is larger than specified by the --memory option (maybe the alternative equivalent name --memlimit-decompress makes this clearer). The default value of that option is rather small 1 << 27 = 134217728 = 128 MB (The reason is that a longer window is never used unless you explicitly pass --long=... for compression) > Where did you get this number It is the longest number which can be passed, namely the longest number fitting into a 32 bit unsigned, thus making zstd accept every possible file. It would be completely equivalent to use the value 1 << 31, because - as mentioned - zstd -d uses the window size stored in the file, and the largest possible window is obtained with --long=31.
I've just tested with zstd-1.3.5, and the largest --memory value it will accept is --memory=4294967289 since this commit: https://github.com/facebook/zstd/commit/9cd5c63771a21c5769366e058d1d8bf1cea89970
I also tested with K (1 << 10) and M (1 << 20) suffixes, and the max allowed values for those are --memory=4194303K and --memory=4095M.
So --long=31 is the max --long value, which corresponds to a --memory value of 1 << 31, which is equivalent to --memory=2048M. So, how about if we use --memory=2048M for readability?
Actually maybe it's easier to simply pass --long=31 to the decompressor, after reading this part of the man page: > windowLog=wlog, wlog=wlog > > Specify the maximum number of bits for a match distance. > > The higher number of increases the chance to find a match which > usually improves compression ratio. It also increases memory > requirements for the compressor and decompressor. The minimum wlog > is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit platforms > and 31 (2 GiB) on 64-bit platforms. > > Note: If windowLog is set to larger than 27, --long=windowLog or > --memory=windowSize needs to be passed to the decompressor.
We'll have to do a 32-bit build to test if --long=31 is allowed for decompression on 32-bit systems, since zstd has conditionals like this: > #define ZSTD_WINDOWLOG_MAX ((unsigned)(sizeof(size_t) == 4 ? ZSTD_WINDOWLOG_MAX_32 : ZSTD_WINDOWLOG_MAX_64))
(In reply to Zac Medico from comment #5) > I've just tested with zstd-1.3.5 I didn't check newer versions than in the gentoo repository... > --memory=4294967289 This looks like a laziness bug: The commit checks whether 429496728 (without the last number) is at most than MAX_UINT/10 - 1. The "K" and "M" variants of the same commit work differently and do not suffer from this bug. To avoid such problems, the number should probably be decreased. As mentioned, specifying the number 2 << 31 is the same. > maybe it's easier to simply pass --long=31 to the decompressor The CLI code handles this option differently by setting not only memlimit but also compressionParams.windowLog to 31 which is passed to the (de?)compression algorithm. I was afraid that this might override the window size stored in the file, but it appears that this value is used only for compression: command time -f %M unzstd --long=31 ... did not show that an unusual large amount of memory was allocated, and decompression worked with files not compressed with --long=31. So probably you are right: --long=31 should be fine.
(In reply to Zac Medico from comment #9) > We'll have to do a 32-bit build to test if --long=31 is allowed I tested on my x86 chroot: --long=31 is not accepted for compressing, but --long=31 is accepted for decompressing. Unsurprisingly, in no case it is possible to decompress files compressed with --long=31 on x86, but it does not hurt to use this option for decompressing, i.e. it is not necessary to make a case distinction for decompressing. (For compression, the situation would be different)
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=f391b2cc5384fc38e99a0598cb3de2346e297c25 commit f391b2cc5384fc38e99a0598cb3de2346e297c25 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2018-08-04 20:18:47 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2018-08-04 20:25:23 +0000 compression_probe: decompress zstd --long=31 (bug 634980) In order to decompress files compressed with zstd --long=31, add --long=31 to the zstd decompress options. Even though zstd compression does not support --long=31 on 32-bit platforms, decompression with --long=31 still works as long as the file was compressed with a smaller windowLog. Reported-by: Martin Väth <martin@mvath.de> Bug: https://bugs.gentoo.org/634980 lib/portage/util/compression_probe.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
With app-arch/zstd-1.4.4-r2, the --long=31 argument does not work for 32-bit architectures, see bug 710444.
The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/proj/portage.git/commit/?id=07da257cfc80509c50104560b1e1508b9e585b98 commit 07da257cfc80509c50104560b1e1508b9e585b98 Author: Zac Medico <zmedico@gentoo.org> AuthorDate: 2020-03-14 23:18:55 +0000 Commit: Zac Medico <zmedico@gentoo.org> CommitDate: 2020-03-14 23:21:55 +0000 compression_probe: omit zstd --long=31 on 32-bit arch (bug 710444) Omit the zstd --long=31 argument for decompression on 32-bit architectures, since the latest version of zstd will otherwise abort with an error on 32-bit architectures. Bug: https://bugs.gentoo.org/710444 Bug: https://bugs.gentoo.org/634980 Signed-off-by: Zac Medico <zmedico@gentoo.org> lib/portage/util/compression_probe.py | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)