I switched my whole system to unicode a while ago. Everything is working fine except this bug I ran into several times: When emerging a program, e.g. subversion, if source files include non-utf8 non-ascii characters (e.g. iso-8859-1 characters _and_ include special characters like German Umlaute) it fails on these. I thought the fast workaround was to do: LANG=de_DE emerge foo but this doesn't work. portage seems to source /etc/profile or something similar. At the moment this means either switching locale (edit /etc/env.d/03locale, env-update, emerge foo and back again) or manually converting the source files. Not good. I also experienced a similar problem when setting the LANGUAGE variable in there, this breaks the OpenOffice ebuild even if I set LANGUAGE to something like GER within /etc/make.conf. Just to demonstrate: perseus subversion-1.0.6 # locale LANG=de_DE.UTF-8 LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE=C LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL= perseus subversion-1.0.6 # LANG=de_DE emerge subversion Calculating dependencies ...done! >>> emerge (1 of 1) dev-util/subversion-1.0.6 to / >>> md5 src_uri ;-) subversion-1.0.6.tar.bz2 make[1]: Entering directory `/var/tmp/portage/subversion-1.0.6/work/subversion-1.0.6/subversion/bindings/java/javahl/src' CLASSPATH=../cls:./../cls:$CLASSPATH /opt/sun-jdk-1.5.0_beta2/bin/javac -d ../cls -g org/tigris/subversion/javahl/BlameCallback.java org/tigris/subversion/javahl/ClientException.java org/tigris/subversion/javahl/DirEntry.java org/tigris/subversion/javahl/JNIError.java org/tigris/subversion/javahl/LogMessage.java org/tigris/subversion/javahl/NodeKind.java org/tigris/subversion/javahl/Notify.java org/tigris/subversion/javahl/PromptUserPassword.java org/tigris/subversion/javahl/PromptUserPassword2.java org/tigris/subversion/javahl/PromptUserPassword3.java org/tigris/subversion/javahl/PropertyData.java org/tigris/subversion/javahl/Revision.java org/tigris/subversion/javahl/SVNClient.java org/tigris/subversion/javahl/SVNClientInterface.java org/tigris/subversion/javahl/SVNClientSynchronized.java org/tigris/subversion/javahl/Status.java org/tigris/subversion/javahl/DirEntry.java:28: unmappable character for encoding UTF8 * @author C�dric Chabanois ^ org/tigris/subversion/javahl/Status.java:25: unmappable character for encoding UTF8 * @author C�dric Chabanois Reproducible: Always Steps to Reproduce: 1. Set up your system to use unicode (LANG=de_DE.UTF-8) 2. emerge a package containing sources with ISO-8859-1 contents 3. watch the errors... Actual Results: Build is failing Expected Results: Build should work, at least when doing LANG=de_DE emerge foo Portage 2.0.50-r9 (default-x86-1.4, gcc-3.3.3, glibc-2.3.3.20040420-r0, 2.6.7- gentoo-r9) ================================================================= System uname: 2.6.7-gentoo-r9 i686 Intel(R) Pentium(R) 4 CPU 2.60GHz Gentoo Base System version 1.4.16 distcc 2.13 i686-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled] Autoconf: sys-devel/autoconf-2.59-r3 Automake: sys-devel/automake-1.8.3 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CFLAGS="-march=pentium4 -O3 -funroll-loops -fomit-frame-pointer -pipe" CHOST="i686-pc-linux-gnu" COMPILER="gcc3" CONFIG_PROTECT="/etc /usr/X11R6/lib/X11/xkb /usr/kde/2/share/config /usr/kde/3. 1/share/config /usr/kde/3.2/share/config /usr/kde/3/share/config /usr/lib/ mozilla/defaults/pref /usr/share/config /usr/share/texmf/dvipdfm/config/ /usr/ share/texmf/dvips/config/ /usr/share/texmf/tex/generic/config/ /usr/share/texmf/ tex/platex/config/ /usr/share/texmf/xdvi/ /var/qmail/control" CONFIG_PROTECT_MASK="/etc/gconf /etc/terminfo /etc/env.d" CXXFLAGS="-march=pentium4 -O3 -funroll-loops -fomit-frame-pointer -pipe" DISTDIR="/usr/portage/distfiles" FEATURES="autoaddcvs ccache distcc" GENTOO_MIRRORS="ftp://sunsite.informatik.rwth-aachen.de/pub/Linux/gentoo ftp:// ftp.tu-clausthal.de/pub/linux/gentoo/ http://linux.rz.ruhr-uni-bochum.de/ download/gentoo-mirror/ ftp://linux.rz.ruhr-uni-bochum.de/gentoo-mirror/ http:// mirrors.sec.informatik.tu-darmstadt.de/gentoo http://ftp.uni-erlangen.de/pub/ mirrors/gentoo ftp://ftp.uni-erlangen.de/pub/mirrors/gentoo" MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="X acpi alsa apache2 apm arts avi berkdb cdr crypt cups doc dvd encode esd foomaticdb gdbm gif gnome gpm gstreamer gtk gtk2 gtkhtml guile imlib java jpeg libg++ libwww linguas_de lufsusermount mad mikmod mmx motif mozilla mozilla- firebird mpeg ncurses nls nptl oggvorbis opengl oss pam pdf pdflib perl php png python qt quicktime readline samba sdl slang spell sse ssl svga tcltk tcpd tetex threads tiff truetype unicode usb vim vim-with-x x86 xml2 xmms xv zlib"
i think, portage should set LANG=C by default.
*** Bug 41947 has been marked as a duplicate of this bug. ***
*** Bug 9901 has been marked as a duplicate of this bug. ***
If so, there should be a way to turn on nls support. I set LANG=C personally but some may want messages in ja_JP.UTF-8. I think broken packages regarding to LANG should be fixed by package/ebuild side (not by portage side).
The problem is with the sourcecode. Lots of developers (especially French guys) write theire whole source in plain ASCII except the "@author" tags (or whatever corresponding). Because most of them seem to use ISO-8859-15 an UTF-8 system (and UTF-8 java) bail out because of an illegal char in the input (which is generally correct). This makes it plain impossible to maintain a system with UTF-8 locale as _lots_ of packets break, especially the java ones. What you're supposing would mean lots of patches for all these java-packages - either to convert to UTF-8 or just to remove the offending lines. That would be a lot of work compared to a simple LANG=C. A compromise might be to set these variables just (and only) before calling make/ant or all steps that really touch the source files.
If it's a problem with source code, please take it to its upstream (That's our policy). What I was saying is, if setting LANG to C is enough it should not be done in portage but an ebuild. Most of our ebuilds are not broken wrt LANG, so why should we force users to see English message instead of localised one? If portage is just a blackbox to install packages that's fine but I think portage is also to aid people build packages. Some users would have benefit from localised error messages and could figure out what the problem is, and so I object if portage forces LANG=C and takes away the opportunity to set it to one's native language.
So basically ISO character sets and program developers are to blame?
developers who assume the regex [a-zA-Z] matches all alpha characters are to blame
packages should build sanely regardless of LANG
*** Bug 133758 has been marked as a duplicate of this bug. ***