Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 707026

Summary: app-text/gnome-doc-utils-0.20.10-r2 - line 611, in merge self.out.write(doc.doc.serialize('utf-8', 1)) UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 687: ordinal not in range(128)
Product: Gentoo Linux Reporter: ernsteiswuerfel <erhard_f>
Component: Current packagesAssignee: Gentoo Linux Gnome Desktop Team <gnome>
Status: UNCONFIRMED ---    
Severity: normal CC: chris, ds-gentoo, luca.chiampo, mattst88, sachse, thomas, todd
Priority: Normal    
Version: unspecified   
Hardware: PPC64   
OS: Linux   
See Also: https://bugs.gentoo.org/show_bug.cgi?id=743980
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: build.log
emerge --info
patch: use ascii encoding instead of utf-8
emerge --info

Description ernsteiswuerfel archtester 2020-01-28 21:19:42 UTC
Created attachment 607226 [details]
build.log

[...]
Making all in gnome-doc-make
make[2]: Entering directory '/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/doc/gnome-doc-make'
if ! test -d de/; then mkdir "de/"; fi
msgfmt -o de/de.mo /var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10/doc/gnome-doc-make/de/de.po
if ! test -d de/; then mkdir de/; fi
if [ -f "C/gnome-doc-make.xml" ]; then d="../"; else d="/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10/doc/gnome-doc-make/"; fi; \
mo="de/de.mo"; \
if [ -f "${mo}" ]; then mo="../${mo}"; else mo="/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10/doc/gnome-doc-make/${mo}"; fi; \
(cd de/ && \
  PYTHONPATH="/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/xml2po:/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10/xml2po:" "/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/xml2po/xml2po/xml2po" -m docbook -e -t "${mo}" \
    "${d}C/gnome-doc-make.xml" > gnome-doc-make.xml.tmp && \
    cp gnome-doc-make.xml.tmp gnome-doc-make.xml && rm -f gnome-doc-make.xml.tmp)
Traceback (most recent call last):
  File "/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/xml2po/xml2po/xml2po", line 191, in <module>
    main(sys.argv[1:])
  File "/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/xml2po/xml2po/xml2po", line 174, in main
    xml2po_main.merge(mofile, filenames[0])
  File "/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10/xml2po/xml2po/__init__.py", line 611, in merge
    self.out.write(doc.doc.serialize('utf-8', 1))
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 687: ordinal not in range(128)
make[2]: *** [Makefile:684: de/gnome-doc-make.xml] Error 1
make[2]: Leaving directory '/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/doc/gnome-doc-make'
make[1]: *** [Makefile:275: all-recursive] Error 1
make[1]: Leaving directory '/var/tmp/portage/app-text/gnome-doc-utils-0.20.10-r2/work/gnome-doc-utils-0.20.10-python3_6/doc'
make: *** [Makefile:364: all-recursive] Error 1
 * ERROR: app-text/gnome-doc-utils-0.20.10-r2::gentoo failed (compile phase):
 *   emake failed
Comment 1 ernsteiswuerfel archtester 2020-01-28 21:20:14 UTC
Created attachment 607238 [details]
emerge --info
Comment 2 ernsteiswuerfel archtester 2020-01-29 16:20:54 UTC
Same on ppc.
Comment 3 Dirk Sondermann 2020-01-30 22:58:43 UTC
It depends on the locale of the build environment (and the available locales) whether this error occurs.

For me,

  LC_CTYPE=POSIX          emerge -1 =app-text/gnome-doc-utils-0.20.10-r2
  LC_CTYPE=de_DE.iso88591 emerge -1 =app-text/gnome-doc-utils-0.20.10-r2

both fail [the first with "UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 687: ordinal not in range(128)", the second with "UnicodeEncodeError: 'latin-1' codec can't encode characters in position 3583-3588: ordinal not in range(256)"], whereas

  LC_CTYPE=C.utf8         emerge -1 =app-text/gnome-doc-utils-0.20.10-r2

succeeds.

The reason is that

  self.out.write(doc.doc.serialize('utf-8', 1))

in gnome-doc-utils-0.20.10/xml2po/xml2po/__init__.py writes a utf-8 encoded string to out = sys.stdout and doesn't make sure that sys.stdout can handle this.

Replacing 'utf-8' with 'ascii' here avoids the problem.
Comment 4 Dirk Sondermann 2020-01-30 23:03:38 UTC
Created attachment 609506 [details, diff]
patch: use ascii encoding instead of utf-8
Comment 5 ernsteiswuerfel archtester 2020-01-30 23:21:31 UTC
(In reply to Dirk Sondermann from comment #4)
> Created attachment 609506 [details, diff] [details, diff]
> patch: use ascii encoding instead of utf-8
Thanks! Your patch works for me.

I got following locales in /etc/locale.gen:
en_US ISO-8859-1
en_US.UTF-8 UTF-8
de_DE ISO-8859-1
de_DE@euro ISO-8859-15
de_DE.UTF-8 UTF-8
Comment 6 Todd Walter 2020-02-06 15:02:52 UTC
I have this as well on AMD64.

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 3583-3588: ordinal not in range(256)
Comment 7 Todd Walter 2020-02-06 15:10:37 UTC
Created attachment 612084 [details]
emerge --info
Comment 8 Martin Dummer 2020-02-22 19:22:02 UTC
I can confirm this on arch AMD64.

LC_CTYPE=C.utf8         emerge -1 =app-text/gnome-doc-utils-0.20.10-r2

works for me too. 

Same workaround - build with an UTF8 locale - as in Bug 702338 and Bug 688330
Comment 9 Dmitry 2020-03-08 08:18:08 UTC
LC_CTYPE=C.utf8 emerge =app-text/gnome-doc-utils-0.20.10-r2

has failed for me.


Portage 2.3.89 (python 3.6.10-final-0, default/linux/amd64/17.1/desktop, gcc-8.3.0, glibc-2.29-r7, 5.5.7-gentoo-x86_64 x86_64)
=================================================================
System uname: Linux-5.5.7-gentoo-x86_64-x86_64-AMD_Ryzen_9_3900X_12-Core_Processor-with-gentoo-2.6
KiB Mem:    32805768 total,   7019080 free
KiB Swap:   12582908 total,  12581164 free
Timestamp of repository gentoo: Sun, 08 Mar 2020 05:37:39 +0000
Head commit of repository gentoo: 1d50432dfefa747ad0bc6a47b23ed8c38920f178

sh bash 4.4_p23-r1
ld GNU ld (Gentoo 2.33.1 p2) 2.33.1
app-shells/bash:          4.4_p23-r1::gentoo
dev-java/java-config:     2.2.0-r4::gentoo
dev-lang/perl:            5.30.1::gentoo
dev-lang/python:          2.7.17-r1::gentoo, 3.6.10::gentoo, 3.7.6::gentoo
dev-util/cmake:           3.14.6::gentoo
dev-util/pkgconfig:       0.29.2::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.13::gentoo
sys-devel/autoconf:       2.13-r1::gentoo, 2.69-r4::gentoo
sys-devel/automake:       1.16.1-r1::gentoo
sys-devel/binutils:       2.33.1-r1::gentoo
sys-devel/gcc:            8.3.0-r3::gentoo, 9.2.0-r2::gentoo
sys-devel/gcc-config:     2.2.1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 4.19::gentoo (virtual/os-headers)
sys-libs/glibc:           2.29-r7::gentoo
Repositories:

xelibrion-repo
    location: /usr/local/portage
    masters: gentoo

dlang
    location: /var/lib/layman/dlang
    masters: gentoo
    priority: 50

jorgicio
    location: /var/lib/layman/jorgicio
    masters: gentoo
    priority: 50

gentoo
    location: /usr/portage
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/gentoo.git
    priority: 1000

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="* -@EULA"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--jobs=24 --load-average=24"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="ftp://ftp-stud.hs-esslingen.de/pub/Mirrors/gentoo/ ftp://mirror.csclub.uwaterloo.ca/gentoo-distfiles/ ftp://mirror.yandex.ru/gentoo-distfiles/"
LANG="C"
LC_ALL=""
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
MAKEOPTS="-j24 -l25"
PKGDIR="/usr/portage/packages"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="X a52 aac acl acpi alsa amd64 berkdb bluetooth branding bzip2 cairo cdda cdr chromium cli consolekit crypt cuda cups curl cxx dbus dri dts dvd dvdr emacs emboss encode eudev exif flac fortran gdbm gif gpm gtk iconv icu ios ipv6 jpeg lcms ldap libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses network nls nptl nvenc ogg opengl openmp pam pango pcre pdf png policykit ppds pulseaudio qt5 readline sdl seccomp spell split-usr ssl startup-notification svg tcpd tiff truetype udev udisks unicode upower usb vorbis wxwidgets x264 x265 xattr xcb xft xinerama xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput keyboard mouse" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" QEMU_SOFTMMU_TARGETS="x86_64" QEMU_USER_TARGETS="x86_64" RUBY_TARGETS="ruby24 ruby25" USERLAND="GNU" VIDEO_CARDS="nvidia" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LINGUAS, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 10 mark lybarger 2020-04-08 12:27:57 UTC
LC_CTYPE=C.utf8 emerge =app-text/gnome-doc-utils-0.20.10-r2

works for me.  not sure what info i can help provide to get a better fix. obviously, this is temporary. i can't rebuild a clean system with this.
Comment 11 Matt Turner gentoo-dev 2020-04-08 18:10:18 UTC
Is the choice of a non-utf8 locale intentional?

Can't you just switch to the utf8-equivalent locale with eselect locale set ...?
Comment 12 Luca Chiampo 2021-02-14 08:40:43 UTC
Also for me on AMD64

LC_CTYPE="it_IT.UTF-8" emerge app-text/gnome-doc-utils

build successfully.
Comment 13 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2021-02-17 19:41:05 UTC
(In reply to Luca Chiampo from comment #12)
> Also for me on AMD64
> 
> LC_CTYPE="it_IT.UTF-8" emerge app-text/gnome-doc-utils
> 
> build successfully.

(No need to CC random teams, but thanks for letting us know it works)
Comment 14 Celphor 2021-02-20 15:05:05 UTC
(In reply to Dirk Sondermann from comment #4)
> Created attachment 609506 [details, diff] [details, diff]
> patch: use ascii encoding instead of utf-8

Thank you. The patch also works for latin-1
Comment 15 Celphor 2021-02-20 15:05:30 UTC
(In reply to Dirk Sondermann from comment #4)
> Created attachment 609506 [details, diff] [details, diff]
> patch: use ascii encoding instead of utf-8

Thank you. The patch also works for latin-1
Comment 16 ganome 2022-02-23 13:45:46 UTC
This bug still exists in Feb 2022 AMD64.   However I find it easer to use the "eselect locale" module.  For instance:

eselect locale list
Available targets for the LANG variable:
  [1]   C
  [2]   C.utf8
  [3]   en_US
  [4]   en_US.iso88591
  [5]   en_US.utf8 *
  [6]   POSIX
  [ ]   (free form)

sudo eselect locale set 5
sudo emerge -av -1 gnome-doc-utils

problem solved!
Comment 17 Jakov Smolić archtester gentoo-dev 2022-02-23 17:25:31 UTC
(please don't cc arch teams unnecessarily)
Comment 18 Hubert 2022-04-15 12:49:22 UTC
15.04.2022 still only works when use this:
sudo LC_CTYPE="it_IT.UTF-8" emerge app-text/gnome-doc-utils
Comment 19 Gert Doering 2022-08-11 08:21:46 UTC
11.08.2022, hit me as well today.

My primary locale is en_US.ISO-8859-1, because I regularily have to work with old software that is not UTF8 capable and it's easier to just keep the system in latin1.
Comment 20 Mart Raudsepp gentoo-dev 2022-08-11 08:37:41 UTC
I think you should see about workarounding your old software instead of keeping your whole system in a 1990-era locale. You could make launchers for the old software that just launch the actual binary with LC_ALL=en_US.ISO-8859-1 then - e.g. if you have /usr/bin/old-binary, you could make a /usr/local/bin/old-binary with contents

#!/bin/sh
LC_ALL=en_US.ISO-8859-1 /usr/bin/old-binary

and fix your system locale. Though if there's a desktop file, that might launch from /usr/bin/ directly and would need adjustments too.


Anyhow, I wouldn't expect this to get fixed, because it's a legacy package already that only a few other legacy things still use and you really should be using a utf-8 system locale and probably have other subtle issues elsewhere if you don't.