This may seem like a lot of effort to get a two digit MB space savings, but I think some people will find it useful. This submission is 3 files that cause portage to bzip2 man and info files in addition to making some utility changes. The first attachment is a patch to a few portage scripts in /usr/lib/portage/bin: doinfo, doman, prepallinfo, prepallman, prepinfo, and prepman. The patch should be applied with cd /usr/lib; patch -p0 < portage-maninfo-bz2.patch. I may have gone a little overboard with the recompression support (gunzip; bzip2), but I wasn't sure how developers are packaging their man/info files. These should apply to portage 2.0.43 cleanly. The second attachment is a patch to texinfo to make the install-info program support bzip2 info pages. The patch merely adds bzip2 support, while still preserving support for gziped info files. This was actually taken from Mandrake's texinfo package. The main complaint with this change is that bzip2 is slow. Well, if you take advantage of man's ability to cache pages you can solve this problem. You can do this by turning on man's caching in /etc/manf.conf (#NOCACHE) and turn off compression of cat pages (#COMPRESS), so man will cache formatted, uncompressed man pages. If you view a man page once, it will be a little slower, but if you view it twice, it will fly. The third attachment is a suggested man.conf to make these changes. Also, you will need to create a directory structure in /var/cache/man and make it chgrp man. man refuses to do this automatically. This is pretty simple actually: cd /var/cache/man mkdir -m 0575 cat{1,2,3,4,5,6,7,8,9,n} chgrp man cat* To keep /var/cache/man clean, check out my ebuild submission for tmpwatch, Bug 9817. http://bugs.gentoo.org/show_bug.cgi?id=9817 I noticed that there was approximately a 50% space savings in similar installations: the bzip2 man page directory tree was 11 MB, while the gzip directory tree was 22 MB. Obviously, it would be a bad idea to recompress all your existing man and info pages because this would break all the package cache info. Once this change has been made, you could either emerge -e world (crazy :) or just continue emerging as usual and all new packages will have their info and man pages bzip2-compressed. The man and info programs (and the patched install-info) can work with both compression formats simultaneously. Lastly, the patches to the portage scripts are one-way changes. There is no way to go back to gzip. I think this change, if ever made official, should be a selectable option in /etc/make.conf. I do not think a USE setting is appropriate, although I'm not sure what is.
Created attachment 5159 [details] portage-maninfo-bz2.patch
Created attachment 5160 [details] texinfo-4.2-bz2.patch
Created attachment 5161 [details] man.conf
I think this should be an *option* for a stage1 install (based on make.conf MAN_COMPRESS, INFO_COMPRESS, DOC_COMPRESS options) but that it should default to gzip. gzip is really the best balance of speed and compression ratio.
I am of the opinion that bzip2 compression with man page caching is the best balance of speed and compression ratio. That is no matter though. I like the _COMPRESS options in make.conf. What do I have to do to get those in there? Should I work on it? Also, I am aware that I need to update this patch for the latest version of portage. I will do that soon.
I will be adding configurable man and info page compression to Portage for MacOS X support. See: http://www.gentoo.org/proj/en/gentoo-alt/macos-1.xml So I'll grab this one.
*** Bug 45232 has been marked as a duplicate of this bug. ***
Is this bug still being worked on? I have provided some simple patches in bug #45232.
Are there any numbers available how it affects space usage and performance ?
I've just misused a server of my company to get some quick numbers. It's a Pentium 4 2.4 GHz with Hyperthreading using a 2.4.26 SMP kernel and 1 GB of RAM. I just cat'ed all the manpages on the machine together (uncompressed), getting a file as large as 46.6 MB (48,821,978 bytes). Quick compression results: - "gzip": 4.6 sec, result: 10.3 MB (10,810,304 bytes) - "gzip -9": 5.8 sec, result: 10.3 MB (10,770,473 bytes) - "bzip2": 21.3 sec, result: 7.7 MB (8,026,931 bytes) (Note that "bzip -9" is the same as "bzip", see bzip2(1).) So, while bzip2 is more than three and a half times as slow as gzip with best compression, it reduces file size by about 25%. I for one would certainly like to enable that on all my boxen :)
ive added PORTAGE_COMPRESS / PORTAGE_COMPRESS_FLAGS support to the doc/man/info scripts if unset, they will default to 'gzip' and '-9', but you can set the compress to say 'bzip2' and portage will do the rest
*** Bug 92385 has been marked as a duplicate of this bug. ***
BTW I think it is more important how long does bzip2 need (compared to gzip) to uncompress a man page (does often happen) rather than how long does it need to compress the man page (does only happen once during emerge).
Fixed on or before 2.0.51.22-r1
Looking through the batch of bugs, I'm not sure that some of these are actually fixed in stable. Others, the requirements have possibly changed after the initial fix was committed. If you think this bug has been closed incorrectly, please reopen or ask that it be reopened.
Is this really fixed in 2.0.51.22-r1? I'm using 2.0.51.22-r2 and PORTAGE_COMPRESS doesn't seem to work.
(In reply to comment #16) > Is this really fixed in 2.0.51.22-r1? I'm using 2.0.51.22-r2 and > PORTAGE_COMPRESS doesn't seem to work. If it doesn't work, reopen the bug ;)
_I_ can't reopen the bug since I wasn't the one who reported it or a Gentoo dev, so I don't have the necessary permissions (I only get a "Leave as RESOLVED FIXED" radio button). Also, maybe I was doing something wrong. I tried the following: 1) PORTAGE_COMPRESS="bzip2" emerge something 2) Adding PORTAGE_COMPRESS="bzip2" to make.conf 2) Adding PORTAGE_COMPRESS="bzip2" to man.conf Was I doing something wrong. Where should PORTAGE_COMPRESS be used?
(In reply to comment #18) > _I_ can't reopen the bug since I wasn't the one who reported it or a Gentoo dev, > so I don't have the necessary permissions (I only get a "Leave as RESOLVED > FIXED" radio button). > Yeah sorry, it's still early here :/ > Also, maybe I was doing something wrong. I tried the following: > 1) PORTAGE_COMPRESS="bzip2" emerge something > 2) Adding PORTAGE_COMPRESS="bzip2" to make.conf > 2) Adding PORTAGE_COMPRESS="bzip2" to man.conf > > Was I doing something wrong. Where should PORTAGE_COMPRESS be used? 1) should definately work, as should 2), 3 is wrong, IIRC.
I tested and this works with portage-2.1.0_alpha20050718. Apparently the patch was never applied to portage-2.0.51.22-r2.
i added this to CVS HEAD at the time (which means portage-2.1) PORTAGE_COMPRESS should be in make.conf it has some bugs atm (like it will bzip2 manpages if they have already been gzipped)
Mike, want to add this in for 2.0.54 (current trunk) and mark this bug as a blocker of #108262?
the code has some quirks still ... like if an ebuild compresses the manpages itself rather than letting portage, portage will compress it twice also, we need to agree on how to share the compression code logic so i dont have to duplicate it across multiple files
Created attachment 80497 [details] prepallman
Created attachment 80498 [details] prepman can you guys give these two a spin ... they still suffer from the problem where they'll compress already compressed files, but i think we should be semi-addressing that issue by getting packages to stop compressing manpages and let portage handle it
*** Bug 130420 has been marked as a duplicate of this bug. ***
Mike, any update on this?
well, no one gave me feedback and stuff works on my machine, so i guess i'll go ahead and commit it :P
Created attachment 106466 [details, diff] portage-ecompress.patch ok, i threw out all my old stuff and started from scratch so that i could implement some new ideas i came up with ... this version allows us to unify a whole bunch of duplicate code
some of the changes that should be highlighted: - two new commands: ecompress and ecompressdir - everything is compressed now ... so if a package wrongly compresses something rather than letting portage do it, it'll get compressed twice ... i dont care; fix your packages - $D is scanned for "man" dirs that look like they should be a mandir rather than using a hardcoded list - do* bins no longer handle compression - recursive bins (prepman) simply call ecompressdir to do the dirty work
ok, ive fixed up an issue or two and committed it
This is in svn r5555 and has been released in 2.1.2_rc4-r9.
*** Bug 181359 has been marked as a duplicate of this bug. ***