In sys-apps/portage-2.1.2-r9 the default compression of man pages and docs are set to high to make any good sence. "bzip2 -9" generates large man pages than "gzip -1" when you are dealing with small sized man pages. The only time "bzip2 -9" gets the upper hand is when dealing with the few really big man pages. There is no documentation in /etc/make.conf.example how to change the compression format and there is no way to set the compression level. Reproducible: Always Steps to Reproduce: 1. emerge app-arch/gzip Actual Results: The zdiff.1.bz2 becomes larger than what the old zdiff.1.gz Here is a small list of sizes with different compressions: zdiff.1 : 802 zdiff.1.bz2 (-1): 461 zdiff.1.bz2 (-5): 461 zdiff.1.bz2 (-9): 461 zdiff.1.gz (-1): 439 zdiff.1.gz (-5): 429 zdiff.1.gz (-9): 429 Expected Results: Had expected a more sane selection of compression. I would suggest that the default compression should be "gzip -5", as it don't generate much larger man pages than what "gzip -9" did, of course there should be a good documentation about how to change the defaults in /etc/make.conf.example.
Use PORTAGE_COMPRESS_FLAGS which is documented in `man make.conf`.
(In reply to comment #1) > Use PORTAGE_COMPRESS_FLAGS which is documented in `man make.conf`. > The issue is not the possibility of choosing another compression system and level, it's the default setting. What I can't understand is the reason to change such a setting automagically, instead of providing instructions to change to bzip2 at the config files... changing something like this without the system admin authorization is really not a good idea.
Yes, bzip2 -9 is overkill, and will lead to MUCH longer decompression times compared to gzip -9. We're talking several times as slow here. Yes, really. And in most cases, it won't save a single byte of disk space either, unless the difference between the gzip compressed and bzip2 compressed file is enough to cross a block size boundary. With 4 kB blocks, it doesn't matter whether a file is 8400 or 9000 bytes, it'll take up the same disk space. So the overall savings are VERY minimal, at a VERY high price in CPU and RAM use. Also, the PORTAGE_COMPRESS and PORTAGE_COMPRESS_FLAGS variables have been implemented, er, shall we diplomatically say less than optimally. If set to a null string, it works the same as if unset, and you get "bzip" and "-9". You have to set PORTAGE_COMPRESS to a null command (like ":") and PORTAGE_COMPRESS_FLAGS to a non-altering option (like " ") to avoid compression. Then there's the point that /usr/share should be distributable across architecture boundaries, like sharing /usr/share/man with NFS (a historical practice). If compressed with bzip2, the man pages no longer work on Unix-like systems that *don't* have bzip2. That's most of them. If the pages aren't meant to be shared, they should go under /usr/man and not /usr/share/man. Also, the compression of files in /usr/share/doc means that much of the documentation won't be readily available. Tried reading HTML docs that have been compressed, and following the links between pages? While it makes sense to compress man pages with a fast compression algorithm on systems with slow storage, as this actually *increases* speed, it doesn't likewise make sense to compress documentation. Especially not when the files cross-reference each other by name. (And, unfortunately, someone changed the distributed /etc/man.conf to use bzip2 -9 for catman pages too. It should, of course, use compress or gzip -9, because the WHOLE POINT of compressing catman pages is to increase access speed, not to save disk space. If you want to save disk space, you don't use a cache.) Anyhow, reverting from bzip2 to gzip should go without saying, as there's no conceivable use for bzip2 outside special situations where a few kB saved is paramount (liveCDs, keyfob distros) and CPU/RAM usage of little importance. And there should, of course, be an option to keep the doc files uncompressed for convenience even if you want the man pages compressed for speed.
Also, here's a small test that also measures the speed and actual disk usage: Assuming man.1 is a typical man page (in size, it appears to be fairly close to the median), I get the following test results: Name Command Size Disk* Decomp. speed** man.1 none 12297 16 kB 1.207 s man.1.Z compress 6398 8 kB 1.357 s man.1.gz gzip -1 5332 8 kB 1.439 s man.1.gz gzip -9 4812 8 kB 1.370 s man.1.bz2 bzip2 -1 4618 8 kB 2.735 s man.1.bz2 bzip2 -9 4618 8 kB 2.857 s *: Assuming 4 kB block size. **: 1000 iterations, best result of 3 consecutive runs. Equipment used: 1.79 GHz Athlon-4 M (Compaq Presario 2175US) File system: XFS, noatime. Commands used: time for i in `seq 1 1000`; do cat man.1 >/dev/null; done time for i in `seq 1 1000`; do zcat man.1.Z >/dev/null; done time for i in `seq 1 1000`; do gzcat man.1.gz >/dev/null; done time for i in `seq 1 1000`; do bzcat man.1.bz2 >/dev/null; done >From this is subtracted the time of the loop construct (0.009s): time for i in `seq 1 1000`; do :; done In other words, for this man page, you save *nothing* by using bzip2 -9 instead of gzip -9, and the net effect is to more than double the extraction time.
Mike, any objection if we'd change this back? I think most of the ecompress bugs should be found/fixed by now (which I assume was the main motivation to change the defaults to bzip)
(In reply to comment #5) > Mike, any objection if we'd change this back? I think most of the ecompress > bugs should be found/fixed by now (which I assume was the main motivation to > change the defaults to bzip) Yeah, the bzip2 was great for flushing out bugs but gzip is probably a suitable long term default. Mike, please close this bug when you get a chance.
Hang on! According to my tests, bzip2 -5 is faster than gzip. Instead of using a specific man page, I compressed every file in /usr/share/man with a certain setting (gzip -1, gzip -5, gzip -9, bzip2 -1, bzip2 -5, bzip2 -9) and measured the time to uncompress all the files sending the output to /dev/null. Each uncompression was performed three times in a row, just after booting the PC in single user mode. The times : bzip2 -1: real 1m30.190s 0m50.056s 0m50.068s user 0m24.878s 0m25.334s 0m25.574s sys 0m26.174s 0m24.722s 0m24.494s bzip2 -5: real 1m29.377s 0m50.799s 0m50.759s user 0m25.222s 0m24.958s 0m24.998s sys 0m26.438s 0m25.842s 0m25.762s bzip2 -9: real 1m40.974s 0m50.769s 0m50.551s user 0m25.578s 0m25.598s 0m25.034s sys 0m26.190s 0m25.170s 0m25.518s gzip -1: real 1m52.998s 1m6.384s 1m6.536s user 0m34.050s 0m33.594s 0m33.330s sys 0m33.598s 0m32.790s 0m33.206s gzip -5: real 1m48.797s 1m6.475s 1m6.375s user 0m33.658s 0m33.018s 0m34.094s sys 0m33.474s 0m33.458s 0m32.282s gzip -9: real 1m47.781s 1m6.567s 1m6.104s user 0m34.082s 0m33.122s 0m33.542s sys 0m33.150s 0m33.446s 0m32.562s the sizes: 40M man-bzip2-1 39M man-bzip2-3 39M man-bzip2-4 39M man-bzip2-5 39M man-bzip2-9 42M man-gzip-1 40M man-gzip-5 40M man-gzip-9 72M man-u (uncompressed) some information about my PC: sudo hdparm -tT /dev/hda /dev/hda: Timing cached reads: 438 MB in 2.00 seconds = 218.95 MB/sec Timing buffered disk reads: 160 MB in 3.01 seconds = 53.19 MB/sec uname -a Linux jorge 2.6.21-gentoo-r2 #5 PREEMPT Sat Jun 9 12:49:08 BRT 2007 i686 AMD Athlon(tm) XP 2600+ AuthenticAMD GNU/Linux
Two corrections: 1) I said bzip2 -5, but I meant bzip2 -1 2) I should tell you the sizes in bytes, not MB. So, removing the -h flag from du: du -s man* 40017 man-bzip2-1 39913 man-bzip2-3 39905 man-bzip2-4 39905 man-bzip2-5 39901 man-bzip2-9 42915 man-gzip-1 40751 man-gzip-5 40651 man-gzip-9 72802 man-u
jorges numbers are probably flawed, since he cold booted and then started the test. That means that his first bzip2 run read from disk instead of from cached mem. I would trust the numbers in comment #4. And I agree, the default should be gzip -9, since it takes the least time for decompression.
(In reply to comment #9) > > And I agree, the default should be gzip -9, since it takes the least time for > decompression. Well, actually pack/unpack (lower case .z files) is the fastest for uncompressing, which is why several commercial Unixes use it for catman. And lzop (.lzo files) is faster too. But pack isn't easily available for Linux due to licensing, and lzop, like bzip2, isn't a standard install, and thus should not be used on anything that could be shared between machines. Anyhow, out of what we have to work with, gzip is, by far, the most available these days, and the decompression speed is relatively fast. So yeah, gzip is the sensible choice, next to "no compression".
For what it's worth, I ran some tests to figure out this dilemma today. I have a relatively good laptop SATA drive, ext4 w/ noatime, 2.5GHz Core i5, hardened kernel 3.7.5. Tests on an average-sized (2-4 KB) manpage show that I/O costs about 400 ms for both uncompressed and compressed versions of the manpage, while decompression costs 1-3 ms for everything between lzo -1 and xz -9, including gzip -X and bzip2 -X. The largest manpage I have (perltoc.1, over 1 MB) takes 20 ms to unxz -9 and 5 ms to unlzop -1. People with mechanical storage have fast CPUs, so I find that it doesn't really matter to them what kind of compression (if any) they use. People with SSD/flash storage care less about I/O. I cannot test how little they care, but I suppose there is negligible speed difference between reading 4 KB and 1 MB. Some of them are the modern laptop users, who don't even care about CPU demands; others are the embedded people who might care about CPU. I conclude that the best default way to do things is no compression. The embedded folks will be 5% happier and the rest of us will get 1 second faster merge times per package. Also, things will just be generally a tiny bit simpler. I suppose noone today cares about disk space when we're talking manpages and docs. The rare few embedded ones who care about disk space on the scale of tens-few hundred MB will have to explicitly turn on compression and wait a bit longer for manpages to display. I'm interested in any ideas concerning this evaluation.
Oops, a typo there: s/400 ms/40 ms/. And reading 1 MB costs about twice as much. But I consider anything <100 ms like it's not there. It might probably be also correct to consider the number of embedded people just too low when compared to others and put a higher priority to the <100ms cares of us SATA regulars. In that case, the best default would probably be xz -9: for small files there is no hurt; for larger files the reduced I/O saves the day. What do you think? It seems consensus is yet to be reached in this bug.
Maybe we should try for conformity with other linux distros, or at least look at the reasoning other distros used to choose their default compression? Looking at an Ubuntu 12.10 system, the default compression appears to be gzip.
(In reply to comment #13) tl;dr: skip to end :p i don't think any other distro has picked anything other than gzip. and most likely for legacy reasons ("it's always been that way") rather than anyone doing serious research. i did numbers analysis a long time ago on this, but i'm not sure if i didn't post it (or i posted it to a diff bug or mailing list or ...). doesn't matter. keep in mind that portage (currently) only supports one compression scheme. so focusing on just /usr/share/man/ doesn't make sense. /usr/share/doc/ has a lot more files and makes much more sense to be compressed higher as those aren't actively being decompressed on a day-to-day basis. i'm not too interested in the claim that this "only matters for embedded". i've got plenty of systems where disk is a premium, and they aren't embedded. to turn the logic around, if your system is "beefy", then the decompression overhead should be irrelevant. for man pages, the focus here should be on decompression. we don't generally care how long it takes to compress the files. on my system of ~17500 man pages (which i think is a bit on the unusual side in terms of total # of files), the compressed sizes we have: bzip2-9 37363270 35% bzip2-8 37363846 35% bzip2-7 37364252 35% bzip2-6 37367266 35% bzip2-5 37377376 35% bzip2-4 37382266 35% bzip2-3 37412669 35% bzip2-2 37476800 35% xz-7 37648808 36% xz-6 37648810 36% xz-8 37648811 36% xz-9 37648811 36% xz-5 37674888 36% bzip2-1 37692413 36% xz-4 37914222 36% gzip-9 38610832 37% gzip-8 38610944 37% gzip-7 38625440 37% gzip-6 38661329 37% gzip-5 38860087 37% xz-3 39039534 37% xz-2 39132314 37% xz-1 39434849 37% gzip-4 39607037 38% gzip-3 41248262 39% gzip-2 41971804 40% gzip-1 43059502 41% lzop-9 46518382 44% lzop-8 46518522 44% lzop-7 46757820 44% lzop-5 57622842 55% lzop-6 57622842 55% lzop-4 57622843 55% lzop-2 57622844 55% lzop-3 57622844 55% lzop-1 57843543 55% uncompressed 104150218 while for compression time (src & dst in tmpfs on an otherwise quiet system): lzop-2 0.32user 0.36system 0:00.74elapsed 91%CPU lzop-1 0.37user 0.30system 0:00.71elapsed 94%CPU lzop-4 0.38user 0.29system 0:00.73elapsed 91%CPU lzop-5 0.39user 0.26system 0:00.94elapsed 69%CPU lzop-6 0.39user 0.27system 0:00.99elapsed 66%CPU lzop-3 0.41user 0.28system 0:00.84elapsed 82%CPU gzip-1 1.81user 0.22system 0:02.07elapsed 98%CPU gzip-2 1.93user 0.19system 0:02.16elapsed 98%CPU gzip-3 2.05user 0.23system 0:02.32elapsed 98%CPU gzip-4 2.34user 0.18system 0:02.59elapsed 97%CPU gzip-5 2.75user 0.27system 0:04.56elapsed 66%CPU gzip-6 3.26user 0.26system 0:04.67elapsed 75%CPU gzip-7 3.47user 0.21system 0:04.73elapsed 77%CPU gzip-8 3.76user 0.27system 0:05.36elapsed 75%CPU gzip-9 3.76user 0.25system 0:04.94elapsed 81%CPU lzop-7 5.56user 0.36system 0:06.40elapsed 92%CPU lzop-8 8.64user 0.25system 0:09.43elapsed 94%CPU lzop-9 8.72user 0.30system 0:09.57elapsed 94%CPU bzip2-3 13.78user 0.76system 0:14.76elapsed 98%CPU bzip2-1 13.87user 0.78system 0:15.00elapsed 97%CPU bzip2-4 13.87user 0.63system 0:14.88elapsed 97%CPU bzip2-2 13.88user 0.64system 0:14.60elapsed 99%CPU bzip2-7 13.95user 2.24system 0:20.93elapsed 77%CPU bzip2-9 13.97user 1.32system 0:19.80elapsed 77%CPU bzip2-5 13.99user 0.71system 0:18.37elapsed 80%CPU bzip2-8 14.06user 1.73system 0:20.29elapsed 77%CPU bzip2-6 14.12user 2.26system 0:21.36elapsed 76%CPU xz-1 16.51user 0.70system 0:18.59elapsed 92%CPU xz-2 28.51user 0.82system 0:30.79elapsed 95%CPU xz-3 43.86user 0.96system 0:45.93elapsed 97%CPU xz-4 56.64user 0.82system 0:58.54elapsed 98%CPU xz-5 154.20user 1.00system 3:17.77elapsed 78%CPU xz-6 156.40user 0.98system 3:16.90elapsed 79%CPU xz-7 261.04user 0.90system 5:06.92elapsed 85%CPU xz-9 401.04user 1.44system 7:26.30elapsed 90%CPU xz-8 402.18user 1.52system 7:26.45elapsed 90%CPU and for decompression time: - src is on a RAID of 7200 rpm disks - dst is /dev/null -- normally we write to a pipe/tty which is RAM anyways - `echo 3 > /proc/sys/vm/drop_caches` before each set of tests - e.g. find <dir> -exec xzcat {} + >/dev/null - system was otherwise quiet lzop-6 1.25user 1.24system 0:10.57elapsed 23%CPU lzop-1 1.31user 1.25system 0:12.76elapsed 20%CPU lzop-2 1.34user 1.22system 0:14.08elapsed 18%CPU lzop-4 1.35user 1.19system 0:12.42elapsed 20%CPU lzop-7 1.40user 1.14system 0:09.94elapsed 25%CPU lzop-3 1.41user 1.16system 0:14.13elapsed 18%CPU lzop-8 1.41user 1.15system 0:10.31elapsed 24%CPU lzop-5 1.42user 1.06system 0:08.99elapsed 27%CPU lzop-9 1.50user 1.07system 0:10.72elapsed 24%CPU gzip-8 3.63user 0.83system 0:10.33elapsed 43%CPU gzip-7 3.81user 0.90system 0:09.13elapsed 51%CPU gzip-6 3.84user 0.91system 0:11.71elapsed 40%CPU gzip-5 3.90user 0.90system 0:11.66elapsed 41%CPU gzip-9 3.90user 0.82system 0:10.79elapsed 43%CPU gzip-1 3.96user 1.21system 0:14.43elapsed 35%CPU gzip-4 3.97user 0.94system 0:12.77elapsed 38%CPU gzip-3 4.00user 0.91system 0:12.36elapsed 39%CPU gzip-2 4.02user 0.96system 0:13.33elapsed 37%CPU xz-5 6.20user 0.69system 0:12.01elapsed 57%CPU xz-7 6.30user 0.68system 0:10.60elapsed 65%CPU xz-1 6.64user 0.63system 0:11.27elapsed 64%CPU xz-3 6.71user 0.76system 0:11.15elapsed 67%CPU xz-2 6.76user 0.82system 0:11.38elapsed 66%CPU xz-8 6.92user 0.68system 0:12.43elapsed 61%CPU xz-6 6.99user 0.72system 0:11.43elapsed 67%CPU bzip2-5 7.18user 0.81system 0:12.00elapsed 66%CPU xz-4 7.26user 0.63system 0:13.78elapsed 57%CPU xz-9 7.44user 0.82system 0:13.40elapsed 61%CPU bzip2-6 7.91user 0.88system 0:12.98elapsed 67%CPU bzip2-2 8.11user 0.93system 0:13.66elapsed 66%CPU bzip2-9 8.11user 0.89system 0:12.84elapsed 70%CPU bzip2-8 8.20user 0.96system 0:14.17elapsed 64%CPU bzip2-4 8.57user 1.01system 0:15.31elapsed 62%CPU bzip2-1 8.65user 1.08system 0:16.54elapsed 58%CPU bzip2-7 9.06user 1.05system 0:15.19elapsed 66%CPU bzip2-3 9.38user 0.97system 0:17.02elapsed 60%CPU bzip2 is the worst in decompression speeds, but has the best compressed result. keep in mind that xz excels with large inputs, but not so much with small inputs which is probably why bzip2 beats it out here (many small files). lzo is kind of interesting, but not really. the fact that our current man packages don't support it out of the box means that it's not useful (we could fix that, but eh). i think as a default, we can sacrifice <10MiB for the majority of systems to get better decompression speeds. we can probably go even better and skip compressing of files below a certain threshold (like <128bytes) because those usually don't compress at all (they get bigger in fact) which means we gain literally nothing but lose out in every other way. pretty much every single one of those is a '.so' redirect to a different man page. if we wanted to get tricky, we might even consider deleting the .so altogether and rewriting it into a symlink ... so what i think we should do is split the compression vars. PMS has no say in this area -- it talks about it in the abstract as in "things may be compressed", but doesn't require any particular variable/compression scheme/flags/etc... thus, for man pages, we default to `gzip -6`. for other files (i.e. /usr/share/docs/), we keep the default of `bzip2 -9`. and we skip compression entirely on man pages under 128 bytes.
OK, but in that case, why gzip -6? According to your results, shouldn't it ideally be more like this? <128B: no compression 128B - 4KiB: still no compression, because the I/O costs the same for everything in this range (I haven't measured, but I believe it's true), so one might as well avoid the decompression phase 4KiB - xyKiB: bzip2 -9 >xyKiB: xz -9 or even -9e I don't have /usr/share/doc, so I can't evaluate how much 'xy' should be or if the files there are large enough to even consider xz. However, we could consider the fact that xz -9 and bzip2 -9 both compress the manpages to an almost equivalent size - the difference is ~0.7% in your case. That easily rounds to nothing. Combined with the faster decompression of xz -9 the picture gets even simpler then: <=4KiB: no compression >4KiB: xz -9 or -9e A more punctual person might want to insert something intermediate which has faster decompression than xz -9, but still produces compressed files <4KiB, so I/O time is not affected. Something like gzip -1. But it probably wouldn't really make much difference. On the other hand, the embedded few would get happier - gzip-1 is fast and the limit would cover most manpages, I guess(?).
(In reply to comment #15) i've posted a patch to disable compression on man pages <=128B looking at the man pages locally, after deleting everything <=128B, there is literally no man page that falls into the <=4KiB region. the next smallest page is 64KiB. so i don't think we need to bother getting into that ugly rabbit hole of graduated compression. i do have /usr/share/doc/. i don't build with USE=doc or USE=examples, but my dir is still 603MiB. if you include those other flags, that'll get much bigger. re-running those tests with all the small files (<=128B) trimmed: bzip2-9 5722229 23% bzip2-8 5722805 23% bzip2-7 5723211 23% bzip2-6 5726221 23% bzip2-5 5736335 23% bzip2-4 5741225 23% bzip2-3 5771628 24% bzip2-2 5835757 24% xz-7 5913491 24% xz-9 5913492 24% xz-6 5913493 24% xz-8 5913496 24% xz-5 5931178 24% xz-4 6036545 25% bzip2-1 6051371 25% xz-3 6318873 26% xz-2 6372501 26% xz-1 6518569 27% gzip-9 6704404 27% gzip-8 6704492 27% gzip-7 6712945 27% gzip-6 6729971 28% gzip-5 6834475 28% gzip-4 7092539 29% gzip-3 7529731 31% lzop-9 7694473 32% lzop-8 7694587 32% lzop-7 7797339 32% gzip-2 7814169 32% gzip-1 8182159 34% lzop-2 10803656 45% lzop-4 10803656 45% lzop-5 10803656 45% lzop-6 10803656 45% lzop-3 10803658 45% lzop-1 10887167 45% uncompressed 23995226 compression times: lzop-6 0.05user 0.02system 0:00.13elapsed 60%CPU lzop-1 0.06user 0.02system 0:00.09elapsed 96%CPU lzop-2 0.07user 0.02system 0:00.10elapsed 95%CPU lzop-5 0.07user 0.01system 0:00.09elapsed 93%CPU lzop-3 0.08user 0.01system 0:00.10elapsed 95%CPU lzop-4 0.08user 0.01system 0:00.10elapsed 97%CPU gzip-1 0.34user 0.02system 0:00.37elapsed 97%CPU gzip-2 0.38user 0.01system 0:00.40elapsed 98%CPU gzip-3 0.46user 0.01system 0:00.47elapsed 98%CPU gzip-4 0.48user 0.03system 0:00.52elapsed 98%CPU gzip-5 0.68user 0.00system 0:00.91elapsed 75%CPU gzip-6 0.94user 0.01system 0:01.41elapsed 67%CPU gzip-7 1.06user 0.02system 0:01.22elapsed 88%CPU gzip-8 1.26user 0.00system 0:01.36elapsed 92%CPU gzip-9 1.27user 0.06system 0:01.45elapsed 92%CPU xz-1 1.75user 0.06system 0:01.86elapsed 97%CPU lzop-7 1.92user 0.02system 0:01.95elapsed 99%CPU bzip2-2 2.14user 0.06system 0:02.21elapsed 99%CPU bzip2-3 2.15user 0.05system 0:02.23elapsed 99%CPU bzip2-4 2.16user 0.07system 0:02.26elapsed 98%CPU bzip2-5 2.16user 0.09system 0:02.27elapsed 99%CPU bzip2-1 2.17user 0.05system 0:02.25elapsed 98%CPU bzip2-7 2.20user 0.18system 0:02.99elapsed 79%CPU bzip2-8 2.21user 0.10system 0:02.98elapsed 77%CPU xz-2 2.23user 0.04system 0:02.29elapsed 99%CPU bzip2-6 2.24user 0.08system 0:02.75elapsed 84%CPU bzip2-9 2.25user 0.17system 0:03.04elapsed 80%CPU xz-3 2.72user 0.06system 0:02.83elapsed 98%CPU lzop-8 2.84user 0.01system 0:02.88elapsed 98%CPU lzop-9 2.91user 0.00system 0:02.97elapsed 97%CPU xz-4 5.89user 0.05system 0:05.99elapsed 99%CPU xz-5 8.04user 0.19system 0:09.28elapsed 88%CPU xz-6 8.80user 0.08system 0:10.70elapsed 83%CPU xz-7 9.34user 0.07system 0:12.47elapsed 75%CPU xz-8 10.65user 0.09system 0:13.18elapsed 81%CPU xz-9 10.82user 0.08system 0:12.82elapsed 85%CPU decompression times: lzop-7 0.04user 0.01system 0:00.06elapsed 86%CPU lzop-9 0.04user 0.00system 0:00.06elapsed 87%CPU lzop-4 0.05user 0.00system 0:00.07elapsed 86%CPU lzop-6 0.05user 0.01system 0:00.07elapsed 86%CPU lzop-8 0.05user 0.00system 0:00.06elapsed 83%CPU lzop-1 0.06user 0.00system 0:00.07elapsed 82%CPU lzop-2 0.06user 0.00system 0:00.07elapsed 86%CPU lzop-3 0.06user 0.00system 0:00.08elapsed 76%CPU lzop-5 0.06user 0.00system 0:00.07elapsed 87%CPU gzip-8 0.23user 0.00system 0:00.35elapsed 68%CPU gzip-7 0.36user 0.03system 0:00.64elapsed 61%CPU gzip-1 0.37user 0.02system 0:00.80elapsed 49%CPU xz-7 0.38user 0.00system 0:00.40elapsed 97%CPU xz-6 0.39user 0.00system 0:00.40elapsed 97%CPU xz-8 0.39user 0.00system 0:00.40elapsed 97%CPU gzip-9 0.40user 0.02system 0:00.50elapsed 84%CPU xz-5 0.40user 0.01system 0:00.43elapsed 95%CPU xz-9 0.40user 0.00system 0:00.41elapsed 97%CPU gzip-2 0.41user 0.02system 0:00.60elapsed 73%CPU gzip-6 0.42user 0.00system 0:00.58elapsed 73%CPU gzip-3 0.44user 0.02system 0:00.64elapsed 72%CPU gzip-5 0.48user 0.02system 0:01.06elapsed 47%CPU xz-4 0.48user 0.01system 0:00.62elapsed 79%CPU xz-2 0.50user 0.03system 0:00.71elapsed 74%CPU gzip-4 0.51user 0.02system 0:00.93elapsed 57%CPU xz-1 0.58user 0.02system 0:00.78elapsed 77%CPU xz-3 0.60user 0.02system 0:00.86elapsed 71%CPU bzip2-7 0.64user 0.02system 0:00.72elapsed 92%CPU bzip2-6 0.69user 0.02system 0:00.82elapsed 86%CPU bzip2-9 0.69user 0.03system 0:00.85elapsed 84%CPU bzip2-4 0.71user 0.04system 0:00.89elapsed 83%CPU bzip2-3 0.72user 0.03system 0:00.96elapsed 78%CPU bzip2-8 0.72user 0.02system 0:00.94elapsed 78%CPU bzip2-2 0.73user 0.02system 0:00.95elapsed 78%CPU bzip2-5 0.73user 0.02system 0:00.89elapsed 85%CPU bzip2-1 0.76user 0.02system 0:01.07elapsed 73%CPU these yield more interesting numbers. there is no compressed size difference between xz -[6789], while compression & decompression is better with xz -6. so with that in mind, i guess i'll change my recommendation to use that by default for man pages.
(In reply to comment #16) > looking at the man pages locally, after deleting everything <=128B, there is > literally no man page that falls into the <=4KiB region. the next smallest > page is 64KiB. so i don't think we need to bother getting into that ugly > rabbit hole of graduated compression. In the first note of it, the <=4KiB is not a part of the graduated scheme. The kernel ends up loading at least 4 KiB anyway, even if the file is smaller. IIRC from the classes. But it needs a check. A check that takes into consideration both the I/O for an individual file(!) upon decompression + the algorithm run time. Decompression must be non-deleting. The working dir contains ~1000 uncompressed manpages, randomly selected from the sum of all manpages. ext4+noatime, SATA disk. Compress: sync; echo 3 > /proc/sys/vm/drop_caches; time bash -c 'for i in *; do XYzip -[1-9] $i; done' Uncompress: time bash -c 'for i in *; do sync; echo 3 > /proc/sys/vm/drop_caches; XYzip -kdc $i >/dev/null; done' First "real": compression time, then decompression. bzip2 -9 real 0m3.187s real 0m18.562s bzip2 -5 real 0m2.775s real 0m18.072s bzip2 -1 real 0m2.776s real 0m18.803s xz -9 real 0m30.914s real 0m18.651s xz -5 real 0m11.478s real 0m24.592s xz -1 real 0m4.741s real 0m20.107s xz -9e real 0m30.912s real 0m19.978s xz -5e real 0m11.530s real 0m20.758s xz -1e real 0m5.645s real 0m19.971s lzop -9 real 0m2.940s real 0m15.741s lzop -5 real 0m2.073s real 0m14.622s lzop -1 real 0m2.129s real 0m14.634s gzip -9 real 0m3.799s real 0m17.614s gzip -5 real 0m2.066s real 0m16.527s gzip -1 real 0m1.883s real 0m16.658s cat real --- real 0m13.754s Again, I can't test on /usr/share/docs content, but I'm totally switching back to uncompressed.
portage no longer compresses man pages below <=128B. i might look into adding a similar limit on doc files. http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commitdiff;h=b47ea7a91220d72b78547721cedb8a4ca6cec39e that means for the remaining files, `bzip2 -9` does give the best compression ration overall (which is the current default). we can look at adding a dedicated compression var for man pages, but it's must less of an issue now.
Note that xz compression is more effective for documentation with a profile containing "pb=0" than without. At least, this was true when I tested back in 2012. At the time, I used the following option in make.conf: PORTAGE_COMPRESS_FLAGS="--lzma2=preset=6e,pb=0" These were my results at the time (the directories without a .bz2 suffix are those containing the xz compressed material): # du -sb /usr/share/{doc.bz2,man.bz2,doc,man} 16676478 /usr/share/doc.bz2 17882717 /usr/share/man.bz2 17393009 /usr/share/doc 11889470 /usr/share/man While xz didn't quite beat bzip2 -9 for compression of my doc directory, it yielded overall savings of 5.03 MiB. What's interesting is that pb=0 was the magic sauce that allowed it to exceed - rather than fall short of - the degree of compression applied by bzip2. I'm not necessarily suggesting that xz is the best choice for gentoo, but this is worth keeping in mind during any evaluation where the effectiveness of the compression is considered important.