Hi there! I'm working on ebuild for package, which has unicode characters in it's html documentation. And when I installing docs (let's forget about banning base eclass and dohtml), I get following traceback: ==== Traceback (most recent call last): File "/usr/lib/portage/python2.7/dohtml.py", line 235, in <module> main() File "/usr/lib/portage/python2.7/dohtml.py", line 220, in main success |= install(basename, dirname, options) File "/usr/lib/portage/python2.7/dohtml.py", line 99, in install install(i, dirname, options, pfx) File "/usr/lib/portage/python2.7/dohtml.py", line 72, in install fullpath = os.path.join(dirname, fullpath) File "/usr/lib64/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 7: ordinal not in range(128) ==== Reproducible: Always
Created attachment 413282 [details] emerge --info
We can use portage.os to transparently handle encoding and decoding in most places. With python3, we should use surrogateescape to encode command arguments (like install.py does). We should die if any arguments or listdir results do not decode as valid UTF-8.
There's a patch in the following branch: https://github.com/zmedico/portage/tree/bug_561846 You can test it like this: echo '=sys-apps/portage-9999 **' >> /etc/portage/package.accept_keywords portage_LIVE_BRANCH=bug_561846 \ portage_LIVE_REPO=https://github.com/zmedico/portage.git \ emerge -1 =sys-apps/portage-9999 I've posted it for review here: https://archives.gentoo.org/gentoo-portage-dev/message/f6a3f74300789f3eeb8371352b5da48d
This is in the master branch: https://gitweb.gentoo.org/proj/portage.git/commit/?id=c788a835067c5ffe8859f38078b390f06a223f5d
Fixed in 2.2.23.