Summary: | sys-apps/portage-2.2_rc54[python3]: bytes/str mixing during "global updates on /etc/portage/package.*" | ||
---|---|---|---|
Product: | Portage Development | Reporter: | Nikolay Orlyuk <virkony> |
Component: | Unclassified | Assignee: | Portage team <dev-portage> |
Status: | RESOLVED FIXED | ||
Severity: | normal | Keywords: | InVCS |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Bug Depends on: | |||
Bug Blocks: | 288499 | ||
Attachments: |
bytes/str mixing fix in pym.portage.update.update_config_files(...)
fix the ValueError |
Description
Nikolay Orlyuk
2009-12-05 09:58:51 UTC
Created attachment 212118 [details, diff]
bytes/str mixing fix in pym.portage.update.update_config_files(...)
Created attachment 212189 [details, diff]
fix the ValueError
This patch should be equivalent to yours, but smaller.
(In reply to comment #2) > Created an attachment (id=212189) [details] > fix the ValueError > > This patch should be equivalent to yours, but smaller. > Just tried to make that code clear. i.e. from line: config_file = os.path.join(abs_user_config, x) and further: for parent, dirs, files in os.walk(config_file): I think abs_user_config is read from some text file and already decoded to unicode. As result config_file is decoded also. So parent, dirs, files will contain unicode strings and you'll be unable to control the way it decoded from FS. If there was no intention to skip names like "In my language - ????" by doing strict decoding and avoid further problems while filling file_contents, then that code can look like: for parent, dirs, files in os.walk(config_file): for y in [dir for dir in dirs if dir.startswith('.')]: dirs.remove(y) recursivefiles.extend([fn for fn in files if not fn.startswith('.')]) which is compatible with python2 (it allows u".test".startswith('.')) and python3 as '.' is str. (In reply to comment #3) > If there was no intention to skip names like "In my language - ????" by doing > strict decoding and avoid further problems while filling file_contents, Sorry, in python2 if something can't be decoded from FS during os.walk that will cause UnicodeDecodeError instead of replacing with "?". In python 3.1 it generates '\udcXX' where XX is original byte, which cause problems while opening file with such name. Anyway as for me, that code was written in that way with intention to have control over FS decoding and to avoid any of those problems. I think the code is safe as is with my patch, although it will skip anything that's not encoded with utf-8 (_encodings['fs'] is currently hardcoded as utf-8). I suppose we can change all the config file code to use _encodings['merge'] instead, which corresponds to sys.getfilesystemencoding(). Currently, _encodings['merge'] is only used in the merge/unmerge code, but I think it makes sense to use it for user config files as well. (In reply to comment #4) > Sorry, in python2 if something can't be decoded from FS during os.walk that > will cause UnicodeDecodeError instead of replacing with "?". > In python 3.1 it generates '\udcXX' where XX is original byte, which cause > problems while opening file with such name. Hopefully these issues are no problem because we have a wrapped version of os.walk (from portage.os) which encodes arguments from unicode to bytes. This is why it walk yields bytes rather than unicode. This is fixed in 2.1.7.11 and 2.2_rc56. If there are any remaining issues then please file a new bug. |