At this point, I'm not sure if this is a bug in this code or in python itself. Up till python 3.5 'backslashreplace' is an encode only handler. Yet there are snippets floating about implementing it for decode. This is one of them. import codecs _backslashreplace_errors = codecs.backslashreplace_errors def backslashreplace_errors(exc): if isinstance(exc, UnicodeDecodeError): tohex = lambda c: "\\x{0:02x}".format(c) u="".join([tohex(exc.object[c]) for c in range(exc.start,exc.end)]) return (u, exc.end) return _backslashreplace_errors(exc) codecs.register_error('backslashreplace', backslashreplace_errors) Now let's take a few semi-random byte strings: a=b"nm\xf6p\x00p" b=b"nm\xf6p\x00pj" c=b"nm\xf6p\x00pjk" Now, c.decode('utf-8', 'backslashreplace') produces the expected result ('nm\\xf6p\x00pjk'), yet neither a nor b do and that happens on both amd64 and x86 (though the semi-random part differs). For example, on x86 it's 'nm\\xf6p\x00d' and 'nm\\xf6p\x00d\x02' ("'nm\\\\xf6p\\x00\\x00'" and "'nm\\\\xf6p\\x00\\x00\\x00'" with repr). So, am I doing something wrong or does the problem lie within python ?
This is not a Gentoo bug. Please take your question to the Python community, as suggested elsewhere.