Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 554632 - dev-lang/python-3.4.1: ?
Summary: dev-lang/python-3.4.1: ?
Status: RESOLVED INVALID
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Development (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Python Gentoo Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-12 15:08 UTC by Rafał Mużyło
Modified: 2015-07-12 15:11 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rafał Mużyło 2015-07-12 15:08:05 UTC
At this point, I'm not sure if this is a bug in this code or in python itself.

Up till python 3.5 'backslashreplace' is an encode only handler.
Yet there are snippets floating about implementing it for decode.
This is one of them.

import codecs
_backslashreplace_errors = codecs.backslashreplace_errors

def backslashreplace_errors(exc):
  if isinstance(exc, UnicodeDecodeError):
    tohex = lambda c: "\\x{0:02x}".format(c)
    u="".join([tohex(exc.object[c]) for c in range(exc.start,exc.end)])
    return (u, exc.end)
  return _backslashreplace_errors(exc)

codecs.register_error('backslashreplace', backslashreplace_errors)


Now let's take a few semi-random byte strings:
a=b"nm\xf6p\x00p"
b=b"nm\xf6p\x00pj"
c=b"nm\xf6p\x00pjk"

Now, c.decode('utf-8', 'backslashreplace') produces the expected result ('nm\\xf6p\x00pjk'), yet neither a nor b do and that happens on both amd64 and x86 (though the semi-random part differs).

For example, on x86 it's 'nm\\xf6p\x00d' and 'nm\\xf6p\x00d\x02' ("'nm\\\\xf6p\\x00\\x00'" and "'nm\\\\xf6p\\x00\\x00\\x00'" with repr).

So, am I doing something wrong or does the problem lie within python ?
Comment 1 Dirkjan Ochtman (RETIRED) gentoo-dev 2015-07-12 15:11:24 UTC
This is not a Gentoo bug. Please take your question to the Python community, as suggested elsewhere.