Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 634980 - sys-apps/portage: support zstd --long=31 option in BINPKG_COMPRESS_FLAGS
Summary: sys-apps/portage: support zstd --long=31 option in BINPKG_COMPRESS_FLAGS
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Binary packages support (show other bugs)
Hardware: All Linux
: Normal normal
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 710444
  Show dependency tree
 
Reported: 2017-10-21 11:27 UTC by Martin Väth
Modified: 2024-02-28 16:08 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
Support BINPKG_COMPRESS_FLAGS="--long=..." with zstd (portage-zstd-long.patch,286 bytes, patch)
2018-08-02 10:50 UTC, Martin Väth
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Väth 2017-10-21 11:27:21 UTC
It is great that portage-2.3.12 now correctly handles zstd for tbz2 archives.

Current versions of zstd also support the long-range option --long=31 (e.g.) which especially for large packages like the linux kernel can remarkably decrease the archive size, essentially at the cost of (in this example) 2GB memory requirement for unpacking.

Unfortunately, it is currently not possible to put this option into BINPKG_COMPRESS_FLAGS, because then an option must be specifying during unpacking which allows for the allocation of more memory.

Therefore, it would be nice, if portage would support e.g. a BINPKG_UNCOMPRESS_FLAGS_ZSTD variable which is passed to unzstd for decompression. Perhaps also other variables like BINPKG_UNCOMPRESS_FLAGS_XZ would make sense, because unxz has a similar issue concerning memory limits.
Comment 1 Martin Väth 2018-08-02 10:50:08 UTC
Created attachment 542086 [details, diff]
Support BINPKG_COMPRESS_FLAGS="--long=..." with zstd

With the attached patch, zstd will decompress tbz2 files which were compressed with e.g.

BINPKG_COMPRESS="zstd"
BINPKG_COMPRESS_FLAGS="--long=29"

The passed option --memory=4294967295 might look agressive, but I tested:

If this option is used, it is *not* required that the option has the required memory. In fact, the amount of memory allocated by zstd with that option is exactly the necessary amount to decompress the given file.

In other words, the only effect of this option is that zstd does not *refuse* to decompress a file whose decompression which requires more memory than the default.
Comment 2 Martin Väth 2018-08-02 10:52:02 UTC
s/that the option has the required memory/that the machine has the mentioned amount of memory/
Comment 3 Zac Medico gentoo-dev 2018-08-02 17:12:41 UTC
(In reply to Martin Väth from comment #1)
> The passed option --memory=4294967295 might look agressive, but I tested:

Where did you get this number? It would be nicer if we could simply tell it to use as much memory as necessary, rather than a specific number.
Comment 4 Martin Väth 2018-08-02 18:55:09 UTC
> simply tell it to use as much memory as necessary

That's exactly what zstd -d does:
It only allocates the window size (which is stored in the file)
needed to decompress the file, independent of the passed --memory option.

The purpose of --memory is only to serve as a sanity check:
zstd dies if the window size stored in the file is larger than specified
by the --memory option (maybe the alternative equivalent name --memlimit-decompress makes this clearer).

The default value of that option is rather small
1 << 27 = 134217728 = 128 MB
(The reason is that a longer window is never used unless you explicitly pass --long=... for compression)

> Where did you get this number

It is the longest number which can be passed, namely the longest number
fitting into a 32 bit unsigned, thus making zstd accept every possible file.

It would be completely equivalent to use the value 1 << 31, because - as mentioned - zstd -d uses the window size stored in the file, and the largest possible window is obtained with --long=31.
Comment 5 Zac Medico gentoo-dev 2018-08-04 18:08:23 UTC
I've just tested with zstd-1.3.5, and the largest --memory value it will accept is --memory=4294967289 since this commit:

https://github.com/facebook/zstd/commit/9cd5c63771a21c5769366e058d1d8bf1cea89970
Comment 6 Zac Medico gentoo-dev 2018-08-04 18:29:03 UTC
I also tested with K (1 << 10) and M (1 << 20) suffixes, and the max allowed values for those are --memory=4194303K and --memory=4095M.
Comment 7 Zac Medico gentoo-dev 2018-08-04 18:45:03 UTC
So --long=31 is the max --long value, which corresponds to a --memory value of 1 << 31, which is equivalent to --memory=2048M. So, how about if we use --memory=2048M for readability?
Comment 8 Zac Medico gentoo-dev 2018-08-04 18:53:17 UTC
Actually maybe it's easier to simply pass --long=31 to the decompressor, after reading this part of the man page:

> windowLog=wlog, wlog=wlog
> 
>     Specify the maximum number of bits for a match distance.
> 
>     The higher number of increases the chance to find a match which
>     usually improves compression ratio. It also increases  memory
>     requirements for the compressor and decompressor. The minimum wlog
>     is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit platforms
>     and 31 (2 GiB) on 64-bit platforms.
> 
>     Note: If windowLog is set to larger than 27, --long=windowLog or
>     --memory=windowSize needs to be passed to the decompressor.
Comment 9 Zac Medico gentoo-dev 2018-08-04 19:09:15 UTC
We'll have to do a 32-bit build to test if --long=31 is allowed for decompression on 32-bit systems, since zstd has conditionals like this:

> #define ZSTD_WINDOWLOG_MAX    ((unsigned)(sizeof(size_t) == 4 ? ZSTD_WINDOWLOG_MAX_32 : ZSTD_WINDOWLOG_MAX_64))
Comment 10 Martin Väth 2018-08-04 19:19:43 UTC
(In reply to Zac Medico from comment #5)
> I've just tested with zstd-1.3.5

I didn't check newer versions than in the gentoo repository...

> --memory=4294967289

This looks like a laziness bug: The commit checks whether 429496728 (without the last number) is at most than MAX_UINT/10 - 1. The "K" and "M" variants of the same commit work differently and do not suffer from this bug.

To avoid such problems, the number should probably be decreased. As mentioned, specifying the number 2 << 31 is the same.

> maybe it's easier to simply pass --long=31 to the decompressor

The CLI code handles this option differently by setting not only memlimit but also compressionParams.windowLog to 31 which is passed to the (de?)compression algorithm. I was afraid that this might override the window size stored in the file, but it appears that this value is used only for compression:

command time -f %M unzstd --long=31 ...

did not show that an unusual large amount of memory was allocated, and decompression worked with files not compressed with --long=31.

So probably you are right: --long=31 should be fine.
Comment 11 Martin Väth 2018-08-04 19:44:12 UTC
(In reply to Zac Medico from comment #9)
> We'll have to do a 32-bit build to test if --long=31 is allowed

I tested on my x86 chroot:
--long=31 is not accepted for compressing, but
--long=31 is accepted for decompressing.

Unsurprisingly, in no case it is possible to decompress files compressed with --long=31 on x86, but it does not hurt to use this option for decompressing, i.e. it is not necessary to make a case distinction for decompressing. (For compression, the situation would be different)
Comment 12 Larry the Git Cow gentoo-dev 2018-08-04 20:27:05 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=f391b2cc5384fc38e99a0598cb3de2346e297c25

commit f391b2cc5384fc38e99a0598cb3de2346e297c25
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2018-08-04 20:18:47 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2018-08-04 20:25:23 +0000

    compression_probe: decompress zstd --long=31 (bug 634980)
    
    In order to decompress files compressed with zstd --long=31, add
    --long=31 to the zstd decompress options. Even though zstd compression
    does not support --long=31 on 32-bit platforms, decompression with
    --long=31 still works as long as the file was compressed with a
    smaller windowLog.
    
    Reported-by: Martin Väth <martin@mvath.de>
    Bug: https://bugs.gentoo.org/634980

 lib/portage/util/compression_probe.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 13 Zac Medico gentoo-dev 2020-03-14 23:05:03 UTC
With app-arch/zstd-1.4.4-r2, the --long=31 argument does not work for 32-bit architectures, see bug 710444.
Comment 14 Larry the Git Cow gentoo-dev 2020-03-14 23:40:32 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=07da257cfc80509c50104560b1e1508b9e585b98

commit 07da257cfc80509c50104560b1e1508b9e585b98
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-03-14 23:18:55 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-03-14 23:21:55 +0000

    compression_probe: omit zstd --long=31 on 32-bit arch (bug 710444)
    
    Omit the zstd --long=31 argument for decompression on 32-bit
    architectures, since the latest version of zstd will otherwise
    abort with an error on 32-bit architectures.
    
    Bug: https://bugs.gentoo.org/710444
    Bug: https://bugs.gentoo.org/634980
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/util/compression_probe.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)