Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 554632

Summary: dev-lang/python-3.4.1: ?
Product: Gentoo Linux Reporter: Rafał Mużyło <galtgendo>
Component: [OLD] DevelopmentAssignee: Python Gentoo Team <python>
Status: RESOLVED INVALID    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Rafał Mużyło 2015-07-12 15:08:05 UTC
At this point, I'm not sure if this is a bug in this code or in python itself.

Up till python 3.5 'backslashreplace' is an encode only handler.
Yet there are snippets floating about implementing it for decode.
This is one of them.

import codecs
_backslashreplace_errors = codecs.backslashreplace_errors

def backslashreplace_errors(exc):
  if isinstance(exc, UnicodeDecodeError):
    tohex = lambda c: "\\x{0:02x}".format(c)
    u="".join([tohex(exc.object[c]) for c in range(exc.start,exc.end)])
    return (u, exc.end)
  return _backslashreplace_errors(exc)

codecs.register_error('backslashreplace', backslashreplace_errors)


Now let's take a few semi-random byte strings:
a=b"nm\xf6p\x00p"
b=b"nm\xf6p\x00pj"
c=b"nm\xf6p\x00pjk"

Now, c.decode('utf-8', 'backslashreplace') produces the expected result ('nm\\xf6p\x00pjk'), yet neither a nor b do and that happens on both amd64 and x86 (though the semi-random part differs).

For example, on x86 it's 'nm\\xf6p\x00d' and 'nm\\xf6p\x00d\x02' ("'nm\\\\xf6p\\x00\\x00'" and "'nm\\\\xf6p\\x00\\x00\\x00'" with repr).

So, am I doing something wrong or does the problem lie within python ?
Comment 1 Dirkjan Ochtman (RETIRED) gentoo-dev 2015-07-12 15:11:24 UTC
This is not a Gentoo bug. Please take your question to the Python community, as suggested elsewhere.