Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 686862 - Trimming stage size by shrinking locale-archive
Summary: Trimming stage size by shrinking locale-archive
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Release Media
Classification: Unclassified
Component: Stages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo Release Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-27 15:33 UTC by Robin Johnson
Modified: 2020-04-08 00:25 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2019-05-27 15:33:56 UTC
TL;DR:
Omit locale-archive on stage tarballs, or reduce it to a minimal set of locales via locale.gen. Substantial space savings in both tarballs & on-disk size.

This bug is NOT about removing locales from the stages entirely, but just reducing the pre-compiled locales that are duplicated.

/usr/lib{,64}/locale/locale-archive contains a pre-compile set of locale data for a host. It is NOT the only copy of the data, merely a pre-compiled set for faster lookup. See bug 187658 for background as to the performance boost.

The install instructions do already contain details on how to configure it for a given system [2].

If the file is not present, glibc behaves fine, and just takes a much longer lookup path.

Over time, the default locales have grown, but with glibc-2.28, Unicode tables were updated, and it jumped from 126MiB to 206MiB [1].

This file has been the largest file in stage tarballs for a long time, except for period in 2014 where it was omitted by default.

As of 2019/05/23 stages, where the file is 210MiB, this shrinks the unpacked stage from 1.4GiB to 1.2GiB. The compressed stage3 tarballs also drop in size, saving of ~49MiB for bzip2 (out of 415MiB), and 7MiB for xz (out of 261MiB).


[1] https://sourceware.org/glibc/wiki/Release/2.28#The_locale-archive_file_is_much_bigger

[2] https://wiki.gentoo.org/wiki/Handbook:AMD64/Installation/Base#Configure_locales
Comment 1 Michael 'veremitz' Everitt 2019-06-02 21:01:05 UTC
I'd be happy to see less locales generated on a stage build, as on slower, fewer-core CPUs this can take a long time (specifically arm32 here).

I'd mooted a reduced set a few times, and I believe dilfridge had tweaked the glibc build to possibly default to a C locale? I believe a couple of options had been kicked around the -releng IRC channel, but nothing concrete decided.

There was a suggested to adopt an /etc/locale.gen file similar to Debians (for instance, in Gentoo there is no en_GB.UTF-8 locale mentioned) but if this is likely to have exploded, that is possibly not helpful.
Comment 2 Larry the Git Cow gentoo-dev 2020-04-08 00:25:09 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/proj/catalyst.git/commit/?id=5fb710093c0d4643b981db7763f7f14d04e02d79

commit 5fb710093c0d4643b981db7763f7f14d04e02d79
Author:     Matt Turner <mattst88@gentoo.org>
AuthorDate: 2020-03-27 23:34:19 +0000
Commit:     Matt Turner <mattst88@gentoo.org>
CommitDate: 2020-04-07 23:23:07 +0000

    targets: Reduce locales to C.UTF8 in stage builds
    
    By default, glibc generates around 500 locales with more added each
    year.
    
    With USE=-compile-locales, glibc generates the locale archive in
    pkg_postinst(). Since files generated in pkg_postinst() are not recorded
    in the vdb, this has the advantage of allowing users to freely change
    the set of enabled locales (by editing /etc/locale.gen and running
    locale-gen).
    
    Since it is so easy for the user to generate any locales they want with
    locale-gen (and they probably would have anyway to rid themselves of the
    499 locales they don't want!), just disable all locales except for
    C.UTF8 and save stage builders a lot of time.
    
    The patch works by
            (1) Writing /etc/locale.gen with "C.UTF8 UTF-8"
            (2) Setting CONFIG_PROTECT so glibc doesn't overwrite
                /etc/locale.gen
            (3) Running etc-update to reset /etc/locale.gen
    
    In order to do this I modified scripts/bootstrap.sh in commit
    0aa49828ae25 (scripts/bootstrap.sh: Allow CONFIG_PROTECT).
    
    Reducing the set of locales cuts the user time (as reported by time(1))
    of the stage2 and stage3 builds as well as the file size of the
    resulting xz'd tarballs:
    
                stage 2           stage 3
             size     time     size      time
    before    89M   22m42s     206M     45m5s
     after    77M    4m29s     195M     26m8s
    
    An alternative solution would be to set USE=compiled-locales for glibc,
    but that has the downside of being non-default and likely causing users
    to unnecessarily rebuild glibc. (We'll do this for the ISOs where we
    want all the locales)
    
    Note that this patch does not change the contents of /etc/locale.gen in
    the stage3 tarball.
    
    Closes: https://bugs.gentoo.org/686862
    Signed-off-by: Matt Turner <mattst88@gentoo.org>

 targets/stage1/chroot.sh            | 1 +
 targets/stage2/chroot.sh            | 6 ++++++
 targets/stage3/chroot.sh            | 7 +++++++
 targets/support/chroot-functions.sh | 6 +++++-
 4 files changed, 19 insertions(+), 1 deletion(-)