Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 295805 - sys-apps/portage-2.2_rc54[python3]: bytes/str mixing during "global updates on /etc/portage/package.*"
Summary: sys-apps/portage-2.2_rc54[python3]: bytes/str mixing during "global updates o...
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Unclassified (show other bugs)
Hardware: All All
: High normal (vote)
Assignee: Portage team
URL:
Whiteboard:
Keywords: InVCS
Depends on:
Blocks: 288499
  Show dependency tree
 
Reported: 2009-12-05 09:58 UTC by Nikolay Orlyuk
Modified: 2009-12-07 03:31 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
bytes/str mixing fix in pym.portage.update.update_config_files(...) (portage-2.2_rc54-python3.patch,2.63 KB, patch)
2009-12-05 10:01 UTC, Nikolay Orlyuk
Details | Diff
fix the ValueError (update_config_files.patch,895 bytes, patch)
2009-12-05 23:32 UTC, Zac Medico
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nikolay Orlyuk 2009-12-05 09:58:51 UTC
Python3 is strict about "bytes" and "str". Part of script pym/portage/update.py manage file-system encoded names and native representation of them in an inappropriate manner.

Reproducible: Always

Steps to Reproduce:

Actual Results:  
Performing Global Updates: /usr/portage/profiles/updates/4Q-2009
(Could take a couple of minutes if you have a lot of binary packages.)
  .='update pass'  *='binary update'  #='/var/db update'  @='/var/db move'
  s='/var/db SLOT move'  %='binary move'  S='binary SLOT move'
  p='update /etc/portage/package.*'
..................
Traceback (most recent call last):
  File "/usr/bin/emerge", line 42, in <module>
    retval = emerge_main()
  File "/usr/lib64/portage/pym/_emerge/main.py", line 1161, in emerge_main
    if portage._global_updates(trees, mtimedb["updates"]):
  File "/usr/lib64/portage/pym/portage/__init__.py", line 8966, in _global_updates
    myupd)
  File "/usr/lib64/portage/pym/portage/update.py", line 208, in update_config_files
    dirs.remove(y)
ValueError: list.remove(x): x not in list

Expected Results:  
Performing Global Updates: /usr/portage/profiles/updates/4Q-2009
(Could take a couple of minutes if you have a lot of binary packages.)
  .='update pass'  *='binary update'  #='/var/db update'  @='/var/db move'
  s='/var/db SLOT move'  %='binary move'  S='binary SLOT move'
  p='update /etc/portage/package.*'
..................p@%#
Comment 1 Nikolay Orlyuk 2009-12-05 10:01:04 UTC
Created attachment 212118 [details, diff]
bytes/str mixing fix in pym.portage.update.update_config_files(...)
Comment 2 Zac Medico gentoo-dev 2009-12-05 23:32:48 UTC
Created attachment 212189 [details, diff]
fix the ValueError

This patch should be equivalent to yours, but smaller.
Comment 3 Nikolay Orlyuk 2009-12-06 11:42:18 UTC
(In reply to comment #2)
> Created an attachment (id=212189) [details]
> fix the ValueError
> 
> This patch should be equivalent to yours, but smaller.
> 
Just tried to make that code clear. 
i.e. from line:
config_file = os.path.join(abs_user_config, x)
and further:
for parent, dirs, files in os.walk(config_file):

I think abs_user_config is read from some text file and already decoded to unicode. As result config_file is decoded also.
So parent, dirs, files will contain unicode strings and you'll be unable to control the way it decoded from FS.
If there was no intention to skip names like "In my language - ????" by doing strict decoding and avoid further problems while filling file_contents, then that code can look like:

for parent, dirs, files in os.walk(config_file):
  for y in [dir for dir in dirs if dir.startswith('.')]:
    dirs.remove(y)
  recursivefiles.extend([fn for fn in files if not fn.startswith('.')])

which is compatible with python2 (it allows u".test".startswith('.')) and python3 as '.' is str.
Comment 4 Nikolay Orlyuk 2009-12-06 12:00:33 UTC
(In reply to comment #3)
> If there was no intention to skip names like "In my language - ????" by doing
> strict decoding and avoid further problems while filling file_contents, 
Sorry, in python2 if something can't be decoded from FS during os.walk that will cause UnicodeDecodeError instead of replacing with "?".
In python 3.1 it generates '\udcXX' where XX is original byte, which cause problems while opening file with such name.
Anyway as for me, that code was written in that way with intention to have control over FS decoding and to avoid any of those problems.
Comment 5 Zac Medico gentoo-dev 2009-12-06 12:08:52 UTC
I think the code is safe as is with my patch, although it will skip anything that's not encoded with utf-8 (_encodings['fs'] is currently hardcoded as utf-8). I suppose we can change all the config file code to use _encodings['merge'] instead, which corresponds to sys.getfilesystemencoding(). Currently, _encodings['merge'] is only used in the merge/unmerge code, but I think it makes sense to use it for user config files as well.
Comment 6 Zac Medico gentoo-dev 2009-12-06 12:24:43 UTC
(In reply to comment #4)
> Sorry, in python2 if something can't be decoded from FS during os.walk that
> will cause UnicodeDecodeError instead of replacing with "?".
> In python 3.1 it generates '\udcXX' where XX is original byte, which cause
> problems while opening file with such name.

Hopefully these issues are no problem because we have a wrapped version of os.walk (from portage.os) which encodes arguments from unicode to bytes. This is why it walk yields bytes rather than unicode.
Comment 7 Zac Medico gentoo-dev 2009-12-07 03:31:38 UTC
This is fixed in 2.1.7.11 and 2.2_rc56. If there are any remaining issues then please file a new bug.