Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 914722 - portage unicode encoding wrapper for shutil breaks shutil.move()
Summary: portage unicode encoding wrapper for shutil breaks shutil.move()
Status: RESOLVED FIXED
Alias: None
Product: Portage Development
Classification: Unclassified
Component: Core (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Portage team
URL: https://docs.python.org/3/library/os....
Whiteboard:
Keywords: InVCS
Depends on: 915120
Blocks:
  Show dependency tree
 
Reported: 2023-09-25 23:38 UTC by Brian Norris
Modified: 2023-10-20 00:51 UTC (History)
3 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Brian Norris 2023-09-25 23:38:04 UTC
NB: my working portage environment is forked off of portage 2.3.75 due to regressions in recent portage. But code inspection shows the issue still applies to mainline portage, at least as of this git hash:

6293f7ba670f vartree: fix syntax

---

lib/portage/__init__.py contains a _unicode_module_wrapper which is applied to the os and shutil libraries, such that all function's args are encoded to bytes before being sent down to the python standard library. This seems to work for most code in the tree today, but it breaks certain usages of shutil.move(). Specifically, it breaks code I'm adding locally; I'm not aware of any existing portage code that this breaks, although it's a bit hard to audit that.

For shutil.move() in particular, things break when the source or destination are directories, and os.rename() doesn't work for $reasons. But presumably shutil is not really audited for use with bytes, so all sorts of things could potentially break.

Litmus test to show how bytes breaks on python3.11 today:

```
$ python3
Python 3.11.5 (main, Aug 25 2023, 07:43:52) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import shutil
>>> import os
>>> os.mkdir("foo")
>>> os.mkdir("bar")
>>> from pathlib import Path
>>> Path("foo/b").touch()


#### Bytes is broken:
>>> shutil.move(b"foo/b", b"bar/")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/shutil.py", line 820, in move
    real_dst = os.path.join(dst, _basename(src))
                                 ^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/shutil.py", line 781, in _basename
    return os.path.basename(path.rstrip(sep))
                            ^^^^^^^^^^^^^^^^
TypeError: a bytes-like object is required, not 'str'

### Strings work fine:
>>> shutil.move("foo/b", "bar/")
```

Personally, this low-level mucking with the os and shutil libraries looks pretty wrong to me, and now that portage has dropped python2 compatibility, it could probably be dropped altogether now.

Reproducible: Always

Steps to Reproduce:
1. add code to portage that uses shutil.move() on directories
Actual Results:  
shutil.move() choked on bytes, with errors like:

[...]
  File "/usr/lib/python3.8/site-packages/portage/__init__.py", line 246, in __call__
    rval = self._func(*wrapped_args, **wrapped_kwargs)
  File "/usr/lib/python3.8/shutil.py", line 798, in move
    if _destinsrc(src, dst):
  File "/usr/lib/python3.8/shutil.py", line 818, in _destinsrc
    if not src.endswith(os.path.sep):
TypeError: endswith first arg must be bytes or a tuple of bytes, not str
ebuild-ipc: during write: subprocess failure: 2

Expected Results:  
shutil.move() works like a standard python installation should work
Comment 1 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-25 23:45:42 UTC
Yeah, I really want to get rid of this.

It's a complete waste of CPU time and it uglies up everything it touches. It's just going to take a lot of manhours to get anywhere with that.
Comment 2 Sam James archtester Gentoo Infrastructure gentoo-dev Security 2023-09-25 23:48:31 UTC
(adding old PR to see also, it's not being actively worked on by them though)
Comment 3 Brian Norris 2023-09-26 00:06:51 UTC
Thanks. It's unlikely I'll try to resurrect that PR on their behalf but for context ... any clue why that PR changed the goalposts from "remove unicode coercions" to "use Path everywhere"? Are there subtle bugs in path construction related to use of bytes vs string or something? I just wouldn't want to let real advancement (the performance gains noted there; or ability to use standard libraries properly) to be held back by some theoretical nicety around using modern path libraries.
Comment 4 Zac Medico gentoo-dev 2023-09-26 01:06:13 UTC
I'd like to eliminate portage's _unicode_module_wrapper as much as anyone else, but this looks like a valid python bug. For example, https://bugs.python.org/issue46727 is open, and https://bugs.python.org/issue18283 was fixed in python 3.8.
Comment 5 Brian Norris 2023-09-26 01:31:40 UTC
> https://bugs.python.org/issue46727 is open

The current expressed preference there is:
"""
(1) We document in the shutil docs that only str paths are officially supported. Bytes paths may sometimes work, but do it at your own risk.
"""

So that's not helpful :)

> https://bugs.python.org/issue18283 was fixed in python 3.8.

They may have claimed to have fixed one shutil function in there, but some of the claims in there are false -- for example, they claim that shutil.move() supports bytes, but they clearly don't have complete test coverage. shutil.move() does have *some* code paths that work with bytes, but others (like the one I outline here, involving directories) has been broken for quite some time.

So I'd go back to option (1) above: that "Bytes paths may sometimes work, but do it at your own risk".

i.e., we shouldn't purposely force all of the Portage codebase into this risk.
Comment 6 Zac Medico gentoo-dev 2023-09-26 02:45:42 UTC
(In reply to Brian Norris from comment #5)
> So I'd go back to option (1) above: that "Bytes paths may sometimes work,
> but do it at your own risk".
> 
> i.e., we shouldn't purposely force all of the Portage codebase into this
> risk.

Prior to PEP 540, I think the simplest path forward would have been to make _unicode_module_wrapper a no-op in cases when we have a UTF-8 locale, since the point of _unicode_module_wrapper was to handle icky cases where os.fsencode/os.fsdecode would fail with a non-UTF8 locale. Now it seems like UTF8-Mode exists tosolve this problem:

https://docs.python.org/3/library/os.html#python-utf-8-mode
Comment 7 Zac Medico gentoo-dev 2023-09-26 18:05:53 UTC
Since it's up to the user to enable UTF8-mode via PYTHONUTF8=1, LC_CTYPE=C, or LC_CTYPE=POSIX, I suppose portage could simple abort with a helpful error message if UTF8-mode is not enabled and the current locale does not use UTF8 encoding.
Comment 8 Mike Gilbert gentoo-dev 2023-09-26 18:08:50 UTC
(In reply to Zac Medico from comment #7)

Or we could force it via shebang.

> #!/usr/bin/env -S python3 -X utf8
Comment 9 Zac Medico gentoo-dev 2023-09-27 06:08:46 UTC
(In reply to Mike Gilbert from comment #8)
> (In reply to Zac Medico from comment #7)
> 
> Or we could force it via shebang.
> 
> > #!/usr/bin/env -S python3 -X utf8

Yeah that's a great idea! Alternatively, we could even do something as radical as re-exec python in order to force it into UTF8-mode.
Comment 10 Zac Medico gentoo-dev 2023-09-30 06:21:58 UTC
In https://github.com/gentoo/portage/pull/1103 I have a minimal draft that's based on this simple change which disables _unicode_func_wrapper usage for UTF-8 locales:

> --- a/lib/portage/__init__.py
> +++ b/lib/portage/__init__.py
> @@ -18,2 +18,3 @@ try:
>          errno.ESTALE = -1
> +    import locale
>      import multiprocessing.util
> @@ -188,2 +189,3 @@ except ImportError as e:
>  
> +utf8_mode = locale.getpreferredencoding(do_setlocale=False).lower() == "utf-8"
>  
> @@ -334,3 +336,3 @@ class _unicode_module_wrapper:
>              result = override
> -        elif isinstance(result, type):
> +        elif utf8_mode or isinstance(result, type):
>              pass

With just a few more changes we can also disable _unicode_module_wrapper entirely (it does provide an os.waitpid wrapper for EINTR retry, but since python 3.5 it is handled by python).
Comment 11 Zac Medico gentoo-dev 2023-10-10 03:03:34 UTC
I think https://github.com/gentoo/portage/pull/1103 is about ready for merge now. 

We should probably test it under selinux in case removal of unicode wrappers from causes trouble, with the selinux modules, but it looks like portage/_selinux.py already performs encoding/decoding that prevented the unicode wrappers from having any substantial effect.

For the part about forcing python into utf8 mode, I suppose we can add that as a separate feature, since I imagine the vast majority of users likely have utf8 locales already.
Comment 12 Larry the Git Cow gentoo-dev 2023-10-11 19:08:20 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/proj/portage.git/commit/?id=4f5f6f571e52af6d2703db760bad4e0ad7439d5a

commit 4f5f6f571e52af6d2703db760bad4e0ad7439d5a
Author:     Zac Medico <zmedico@gentoo.org>
AuthorDate: 2023-10-10 01:36:02 +0000
Commit:     Zac Medico <zmedico@gentoo.org>
CommitDate: 2023-10-11 19:05:02 +0000

    Support Python UTF-8 Mode via portage.utf8_mode (bug 914722)
    
    When a UTF-8 locale, or UTF-8 mode is detected, set
    portage.utf8_mode to True, and do not wrap file access
    with _unicode_func_wrapper. This is intended to mitgate
    issues with byte string handling in python libraries
    like shutil, as reported in bug 914722.
    
    This patch is intended to be a simple and minimal
    implementation that can be optimized later through
    the elimination of unecessary encoding/decoding.
    
    The str() wrapping in the unit tests is for lazily
    evaluated instances of lazy_value, which is used to
    account for mock portage.const.EPREFIX values that
    exist during unit tests.
    
    Bug: https://bugs.gentoo.org/914722
    Signed-off-by: Zac Medico <zmedico@gentoo.org>

 NEWS                                               |  4 +++
 lib/portage/__init__.py                            |  4 +++
 lib/portage/_sets/__init__.py                      | 10 ++++--
 lib/portage/dbapi/vartree.py                       | 11 ++++--
 lib/portage/gpkg.py                                | 10 +++++-
 lib/portage/package/ebuild/doebuild.py             |  4 +++
 lib/portage/tests/dbapi/test_portdb_cache.py       |  2 +-
 lib/portage/tests/ebuild/test_fetch.py             |  2 +-
 lib/portage/tests/emerge/test_config_protect.py    | 14 +++++---
 .../emerge/test_emerge_blocker_file_collision.py   |  7 +++-
 lib/portage/tests/emerge/test_emerge_slot_abi.py   | 14 ++++++--
 lib/portage/tests/emerge/test_simple.py            | 40 +++++++++++++++-------
 lib/portage/tests/resolver/ResolverPlayground.py   |  4 +--
 lib/portage/tests/sync/test_sync_local.py          |  2 +-
 lib/portage/tests/util/test_getconfig.py           |  2 +-
 lib/portage/xpak.py                                |  5 +++
 16 files changed, 105 insertions(+), 30 deletions(-)
Comment 13 Larry the Git Cow gentoo-dev 2023-10-20 00:51:32 UTC
The bug has been closed via the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3500483f75789c36e379c1b6546a09df7c11e2b1

commit 3500483f75789c36e379c1b6546a09df7c11e2b1
Author:     Sam James <sam@gentoo.org>
AuthorDate: 2023-10-20 00:49:33 +0000
Commit:     Sam James <sam@gentoo.org>
CommitDate: 2023-10-20 00:51:00 +0000

    sys-apps/portage: add 3.0.53
    
    Closes: https://bugs.gentoo.org/915120
    Closes: https://bugs.gentoo.org/821529
    Closes: https://bugs.gentoo.org/914441
    Closes: https://bugs.gentoo.org/914722
    Closes: https://bugs.gentoo.org/914873
    Closes: https://bugs.gentoo.org/915099
    Closes: https://bugs.gentoo.org/915123
    Closes: https://bugs.gentoo.org/915128
    Closes: https://bugs.gentoo.org/915136
    Closes: https://bugs.gentoo.org/915330
    Closes: https://bugs.gentoo.org/915494
    Closes: https://bugs.gentoo.org/915834
    Closes: https://bugs.gentoo.org/915903
    Closes: https://bugs.gentoo.org/900224
    Signed-off-by: Sam James <sam@gentoo.org>

 sys-apps/portage/Manifest              |   1 +
 sys-apps/portage/portage-3.0.53.ebuild | 238 +++++++++++++++++++++++++++++++++
 2 files changed, 239 insertions(+)