Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 721402 - sys-apps/portage: UnicodeEncodeError: 'utf-8' codec can't encode characters: surrogates not allowed
Summary: sys-apps/portage: UnicodeEncodeError: 'utf-8' codec can't encode characters: ...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal minor
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 721152
  Show dependency tree
 
Reported: 2020-05-07 09:27 UTC by Vladimir Varlamov
Modified: 2020-07-22 16:22 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vladimir Varlamov 2020-05-07 09:27:48 UTC
for emerge perl at install or qmerge phase when created hardlock tmp file:

Traceback (most recent call last):
  File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 277, in <module>
    sys.exit(ebuild_ipc_main(sys.argv[1:]))
  File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 273, in ebuild_ipc_main
    return ebuild_ipc.communicate(args)
  File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 130, in communicate
    lock_obj = portage.locks.lockfile(self.ipc_lock_file, unlinkfile=True)
  File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 135, in lockfile
    unlinkfile=unlinkfile, waiting_msg=waiting_msg, flags=flags)
  File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 324, in _lockfile_iteration
    (removed, fstat_result) = _lockfile_was_removed(myfd, lockfilename)
  File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 379, in _lockfile_was_removed
    hardlink_path = hardlock_name(lock_path)
  File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 519, in hardlock_name
    (tail, os.uname()[1], os.getpid()))
  File "/usr/lib64/python3.6/site-packages/portage/__init__.py", line 245, in __call__
    wrapped_args, wrapped_kwargs = self._process_args(args, kwargs)
  File "/usr/lib64/python3.6/site-packages/portage/__init__.py", line 232, in _process_args
    for x in args]
  File "/usr/lib64/python3.6/site-packages/portage/__init__.py", line 232, in <listcomp>
    for x in args]
  File "/usr/lib64/python3.6/site-packages/portage/__init__.py", line 185, in _unicode_encode
    s = s.encode(encoding, errors)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 20-22: surrogates not allowed


I added some debug to the call of this procedure to see what is happening:

_unicode_encode(s='..ipc_lock.hardlock-\udce1\udc9b\udc9c-25307', encoding=utf_8, errors='strict')

# hostname
ᛜ

# hostname | od -x
0000000 9be1 0a9c
0000004

# python 
Python 3.7.7 (default, May  5 2020, 01:54:13) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.uname()[1]
'ᛜ'
>>> os.uname()[1].encode('utf_8',errors='strict')
b'\xe1\x9b\x9c'


--
Portage 2.3.99 (python 3.6.10-final-0, default/linux/amd64/17.1, gcc-9.3.0, glibc-2.30-r8, 4.14.32-std522-amd64 x86_64)
=================================================================
System uname: Linux-4.14.32-std522-amd64-x86_64-Intel-R-_Core-TM-_i3-2120_CPU_@_3.30GHz-with-gentoo-2.6
KiB Mem:     8148368 total,    373064 free
KiB Swap:    2097148 total,   2097148 free
Timestamp of repository gentoo: Wed, 06 May 2020 05:30:01 +0000
Head commit of repository gentoo: 3e6b8985084256d7afce1953701d08f4a1d84fbe
sh bash 5.0_p17
ld GNU ld (Gentoo 2.33.1 p2) 2.33.1
app-shells/bash:          5.0_p17::gentoo
dev-java/java-config:     2.2.0-r4::gentoo
dev-lang/perl:            5.30.1::gentoo
dev-lang/python:          2.7.18::gentoo, 3.6.10-r2::gentoo, 3.7.7-r2::gentoo, 3.8.2-r2::gentoo
dev-util/cmake:           3.16.5::gentoo
sys-apps/baselayout:      2.6-r1::gentoo
sys-apps/openrc:          0.42.1::gentoo
sys-apps/sandbox:         2.13::gentoo
sys-devel/autoconf:       2.69-r4::gentoo
sys-devel/automake:       1.15.1-r2::gentoo, 1.16.1-r1::gentoo
sys-devel/binutils:       2.33.1-r1::gentoo
sys-devel/gcc:            9.3.0::gentoo
sys-devel/gcc-config:     2.2.1::gentoo
sys-devel/libtool:        2.4.6-r6::gentoo
sys-devel/make:           4.2.1-r4::gentoo
sys-kernel/linux-headers: 5.6::gentoo (virtual/os-headers)
sys-libs/glibc:           2.30-r8::gentoo
Repositories:

gentoo
    location: /var/portage
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    sync-rsync-extra-opts: 
    sync-rsync-verify-max-age: 24
    sync-rsync-verify-jobs: 1
    sync-rsync-verify-metamanifest: yes

bes
    location: /var/db/repos/gentoo-overlay-bes
    masters: gentoo
    priority: 51

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="*"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -march=sandybridge -pipe -fomit-frame-pointer"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt /var/spool/munin-async/.ssh"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/php/apache2-php7.4/ext-active/ /etc/php/cgi-php7.4/ext-active/ /etc/php/cli-php7.4/ext-active/ /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -march=sandybridge -pipe -fomit-frame-pointer"
DISTDIR="/var/distfiles"
EMERGE_DEFAULT_OPTS="--jobs 3 --load-average 5"
ENV_UNSET="DBUS_SESSION_BUS_ADDRESS DISPLAY GOBIN PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="en_GB.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LINGUAS="en"
MAKEOPTS="-j2"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
USE="acl acpi amd64 bash-completion bzip2 bzlib charconv cli crypt curl dnsdb dovecot-sasl dri exiscan-acl fastcgi ffmpeg fftw fontconfig gdbm gnutls gpm http2 iconv icu idn ipv6 jpeg jpeg2k lame libtirpc lm_sensors lzma maildir multilib mysql ncurses nfsv41 nls nptl openmp pam pcre png postscript readline sasl seccomp spf split-usr sqlite ssl symlink syslog threads truetype unicode urandom utf8 webp winbind xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2018" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" APACHE2_MODULES="*" APACHE2_MPMS="worker" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" GRUB_PLATFORMS="emu pc efi-64" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" NGINX_MODULES_HTTP="echo access auth_basic autoindex browser charset empty_gif headers_more fastcgi geo geoip2 gzip gzip_static image_filter limit_req limit_conn map perl proxy realip rewrite referer ssi stub_status userid addition auth_request brotli sub" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php7-2" POSTGRES_TARGETS="postgres10 postgres11" PYTHON_SINGLE_TARGET="python3_6" PYTHON_TARGETS="python2_7 python3_6" RUBY_TARGETS="ruby24 ruby25" USERLAND="GNU" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset:  CC, CPPFLAGS, CTARGET, CXX, INSTALL_MASK, LC_ALL, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS

Reproducible: Always
Comment 1 Mike Gilbert gentoo-dev 2020-05-07 15:01:51 UTC
To clarify: are you intentionally trying to break portage by setting a strange hostname?
Comment 2 Vladimir Varlamov 2020-05-07 15:50:50 UTC
(In reply to Mike Gilbert from comment #1)
> To clarify: are you intentionally trying to break portage by setting a
> strange hostname?

Not at all, but do you think in 2020 it is permissible to fall on the encoding from the Basic Plane Unicode from 1999? If I can create file with this symbol then what's the problem?
Plese there is no place to find out why python works so badly with Unicode and generally emotionally debate. But what I see is an incompletely well-written program.
Comment 3 Mike Gilbert gentoo-dev 2020-05-07 16:13:55 UTC
So you really use unicode symbol [ᛜ]  16DC  RUNIC LETTER INGWAZ as your hostname normally?

I'm not saying this is not a valid bug, but at least be honest about your intentions.
Comment 4 Vladimir Varlamov 2020-05-07 16:16:42 UTC
Seriously. I bought a domain. https://ᛜ.net/
Comment 5 Mike Gilbert gentoo-dev 2020-05-07 16:27:12 UTC
(In reply to Vladimir Varlamov from comment #4)
> Seriously. I bought a domain. https://ᛜ.net/

That gets translated to https://xn--txe.net/ by modern user agents. Probably a limitation of DNS or HTTP.
Comment 6 Zac Medico gentoo-dev 2020-05-07 16:30:40 UTC
(In reply to Vladimir Varlamov from comment #0)
> for emerge perl at install or qmerge phase when created hardlock tmp file:
> 
> Traceback (most recent call last):
>   File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 277, in <module>
>     sys.exit(ebuild_ipc_main(sys.argv[1:]))
>   File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 273, in
> ebuild_ipc_main
>     return ebuild_ipc.communicate(args)
>   File "/usr/lib/portage/python3.6/ebuild-ipc.py", line 130, in communicate
>     lock_obj = portage.locks.lockfile(self.ipc_lock_file, unlinkfile=True)
>   File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 135, in
> lockfile
>     unlinkfile=unlinkfile, waiting_msg=waiting_msg, flags=flags)
>   File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 324, in
> _lockfile_iteration
>     (removed, fstat_result) = _lockfile_was_removed(myfd, lockfilename)
>   File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 379, in
> _lockfile_was_removed
>     hardlink_path = hardlock_name(lock_path)
>   File "/usr/lib64/python3.6/site-packages/portage/locks.py", line 519, in
> hardlock_name
>     (tail, os.uname()[1], os.getpid()))

The trigger is the os.uname()[1] usage here. We can use something like  portage._decode_argv([os.uname()[1]])[0] to translate the surrogate here.
Comment 7 Vladimir Varlamov 2020-05-07 19:44:09 UTC
> We can use something like 
> portage._decode_argv([os.uname()[1]])[0] to translate the surrogate here.

Tested. It works!
Comment 8 Larry the Git Cow gentoo-dev 2020-05-07 20:35:23 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=def3574d3fe9b944dd83e561462ccc6de6f90ff3

commit def3574d3fe9b944dd83e561462ccc6de6f90ff3
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-05-07 20:32:03 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-05-07 20:34:02 +0000

    locks: translate surrogate from uname (bug 721402)
    
    Prevent an error like this when attempting to encode a surrogate:
    
    UnicodeEncodeError: 'utf-8' codec can't encode characters in position 20-22: surrogates not allowed
    
    Tested-by: Vladimir Varlamov <bes.internal@gmail.com>
    Bug: https://bugs.gentoo.org/721402
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 lib/portage/locks.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
Comment 9 Larry the Git Cow gentoo-dev 2020-05-25 00:24:08 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=77960c6cf984530dbcab9fe507e170e7a2fe7dcf

commit 77960c6cf984530dbcab9fe507e170e7a2fe7dcf
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2020-05-25 00:20:07 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2020-05-25 00:20:58 +0000

    sys-apps/portage: Bump to version 2.3.100
    
     #715108 Change default BINPKG_COMPRESS to zstd
     #719456 Add dependency on app-arch/zstd
     #720866 Do not set PKG_CONFIG_PATH
     #721402 Hostname UnicodeEncodeError surrogates not allowed
     #721516 Suppress precompressed QA notice for docompress -x
    
    Bug: https://bugs.gentoo.org/721152
    Bug: https://bugs.gentoo.org/715108
    Bug: https://bugs.gentoo.org/719456
    Bug: https://bugs.gentoo.org/720866
    Bug: https://bugs.gentoo.org/721402
    Bug: https://bugs.gentoo.org/721516
    Package-Manager: Portage-2.3.100, Repoman-2.3.22
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 sys-apps/portage/Manifest               |   1 +
 sys-apps/portage/portage-2.3.100.ebuild | 261 ++++++++++++++++++++++++++++++++
 2 files changed, 262 insertions(+)