554632 – dev-lang/python-3.4.1: ?

Bug 554632 - dev-lang/python-3.4.1: ?

Summary: dev-lang/python-3.4.1: ?

Status:	RESOLVED INVALID

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	[OLD] Development (show other bugs)
Hardware:	All Linux

Importance:	Normal normal (vote)
Assignee:	Python Gentoo Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2015-07-12 15:08 UTC by Rafał Mużyło
Modified:	2015-07-12 15:11 UTC (History)
CC List:	0 users

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rafał Mużyło 2015-07-12 15:08:05 UTC

At this point, I'm not sure if this is a bug in this code or in python itself.

Up till python 3.5 'backslashreplace' is an encode only handler.
Yet there are snippets floating about implementing it for decode.
This is one of them.

import codecs
_backslashreplace_errors = codecs.backslashreplace_errors

def backslashreplace_errors(exc):
  if isinstance(exc, UnicodeDecodeError):
    tohex = lambda c: "\\x{0:02x}".format(c)
    u="".join([tohex(exc.object[c]) for c in range(exc.start,exc.end)])
    return (u, exc.end)
  return _backslashreplace_errors(exc)

codecs.register_error('backslashreplace', backslashreplace_errors)


Now let's take a few semi-random byte strings:
a=b"nm\xf6p\x00p"
b=b"nm\xf6p\x00pj"
c=b"nm\xf6p\x00pjk"

Now, c.decode('utf-8', 'backslashreplace') produces the expected result ('nm\\xf6p\x00pjk'), yet neither a nor b do and that happens on both amd64 and x86 (though the semi-random part differs).

For example, on x86 it's 'nm\\xf6p\x00d' and 'nm\\xf6p\x00d\x02' ("'nm\\\\xf6p\\x00\\x00'" and "'nm\\\\xf6p\\x00\\x00\\x00'" with repr).

So, am I doing something wrong or does the problem lie within python ?

Comment 1 Dirkjan Ochtman (RETIRED) gentoo-dev

2015-07-12 15:11:24 UTC

This is not a Gentoo bug. Please take your question to the Python community, as suggested elsewhere.