Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 531636

Summary: app-portage/gentoolkit-0.4.0: equery can't handle unicode filenames reliably
Product: Portage Development Reporter: Patrick Lauer <patrick>
Component: Third-Party ToolsAssignee: Portage Tools Team <tools-portage>
Status: RESOLVED FIXED    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Patrick Lauer gentoo-dev 2014-12-04 10:22:11 UTC
# equery s app-misc/ca-certificates
Traceback (most recent call last):
  File "/usr/lib/python-exec/python2.7/equery", line 38, in <module>
    equery.main(sys.argv)
  File "/usr/lib64/python2.7/site-packages/gentoolkit/equery/__init__.py", line 357, in main
    loaded_module.main(module_args)
  File "/usr/lib64/python2.7/site-packages/gentoolkit/equery/size.py", line 192, in main
    display_size(matches)
  File "/usr/lib64/python2.7/site-packages/gentoolkit/equery/size.py", line 82, in display_size
    size, files, uncounted = pkg.size()
  File "/usr/lib64/python2.7/site-packages/gentoolkit/package.py", line 383, in size
    st = os.lstat(path)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u011f' in position 49: ordinal not in range(128)


CONTENTS shows filenames like:
obj /usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_F<C5><91>tan<C3><BA>s<C3><AD>tv<C3><A1>ny.crt 22f5bca8ba618e920978c24c4b68d84c 1378540449


No locale set, so this is pretty much a "default" stage3 unpacked and then playing around in it.
Comment 1 Zac Medico gentoo-dev 2017-09-05 23:17:34 UTC
Using eselect locale to set a UTF-8 locale may help in some cases.
Comment 2 Paul Varner (RETIRED) gentoo-dev 2017-09-06 18:39:22 UTC
Fixed in git

https://gitweb.gentoo.org/proj/gentoolkit.git/commit/?id=308e33dc9e0cba958a583d86799dcb660ba39cb1

As an aside, I despise how Python is dependent on locale settings for unicode, finding all the places to explicitly encode has been painful.