I've got the following error having Python 3.3 as default: Traceback (most recent call last): File "setup.py", line 23, in <module> long_description=open('README.txt','r').read(), File "/usr/lib64/python3.3/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 14839: ordinal not in range(128) Reproducible: Always
Created attachment 372800 [details, diff] patch to fix the problem
Got hit too. Haven't tried the patch.
Comment on attachment 372800 [details, diff] patch to fix the problem This will cause an error in python 2.6 and 2.7.
Here's a hint. http://python3porting.com/problems.html#reading-from-files
Or simply patch readme, that charachter is probably an error --- README.txt 2014-03-14 09:58:59.000000000 +0100 +++ /root/README.txt 2014-03-18 01:12:47.988642649 +0100 @@ -425,7 +425,7 @@ measurement. All other timezones are defined relative to UTC, and include offsets like -UTC+0800 � hours to add or subtract from UTC to derive the local time. No +UTC+0800 - hours to add or subtract from UTC to derive the local time. No daylight saving time occurs in UTC, making it a useful timezone to perform date arithmetic without worrying about the confusion and ambiguities caused by daylight saving time transitions, your country changing its timezone, or
This has been fixed by way of a patch in many other python packages long ago and sadly still occurs. This is; infile = codecs.open('UTF-8.txt', 'r', encoding='UTF-8') I believe the essence of the hint. It's not the readme you patch but the code that opens it. There ought still be such patches in python herd packages
(In reply to Francesco Riosa from comment #5) > Or simply patch readme, that charachter is probably an error > > --- README.txt 2014-03-14 09:58:59.000000000 +0100 > +++ /root/README.txt 2014-03-18 01:12:47.988642649 +0100 > @@ -425,7 +425,7 @@ > measurement. > > All other timezones are defined relative to UTC, and include offsets like > -UTC+0800 � hours to add or subtract from UTC to derive the local time. No > +UTC+0800 - hours to add or subtract from UTC to derive the local time. No > daylight saving time occurs in UTC, making it a useful timezone to perform > date arithmetic without worrying about the confusion and ambiguities caused > by daylight saving time transitions, your country changing its timezone, or It's not an error. I looked at it in a hex editor, and it's three bytes, 0xe2 0x80 0x94, which is definitely UTF-16 encoding. Extract out the relevant bits and re-assemble into a 2-byte value, and you get U+2014, also known as an "Em Dash" in the Unicode char set: http://www.fileformat.info/info/unicode/char/2014/index.htm It probably means that the README.txt file was typed up in an editor that attempts to do some kind of "smart replacement" of specific characters with correct typographic replacements. Like MS Word and those accursed smart quotes...
(In reply to Joshua Kinard from comment #7) > (In reply to Francesco Riosa from comment #5) > > Or simply patch readme, that charachter is probably an error > > > > --- README.txt 2014-03-14 09:58:59.000000000 +0100 > > +++ /root/README.txt 2014-03-18 01:12:47.988642649 +0100 > > @@ -425,7 +425,7 @@ > > measurement. > > > > All other timezones are defined relative to UTC, and include offsets like > > -UTC+0800 � hours to add or subtract from UTC to derive the local time. No > > +UTC+0800 - hours to add or subtract from UTC to derive the local time. No > > daylight saving time occurs in UTC, making it a useful timezone to perform > > date arithmetic without worrying about the confusion and ambiguities caused > > by daylight saving time transitions, your country changing its timezone, or > > It's not an error. I looked at it in a hex editor, and it's three bytes, > 0xe2 0x80 0x94, which is definitely UTF-16 encoding. Extract out the > relevant bits and re-assemble into a 2-byte value, and you get U+2014, also > known as an "Em Dash" in the Unicode char set: > http://www.fileformat.info/info/unicode/char/2014/index.htm > > It probably means that the README.txt file was typed up in an editor that > attempts to do some kind of "smart replacement" of specific characters with > correct typographic replacements. Like MS Word and those accursed smart > quotes... Hello, maybe it could be not an error, but in this way an entire updating process is stopped (at least in my case) because someone wrote some weird characters in a README file. IMHO, I think this should be avoided... thanks, Fausto
(In reply to Joshua Kinard from comment #7) > It's not an error. I looked at it in a hex editor, and it's three bytes, > 0xe2 0x80 0x94, which is definitely UTF-16 encoding. It's actually UTF-8. ^_^ + 22 Mar 2014; Mike Gilbert <floppym@gentoo.org> + +files/pytz-2014.1-setup.py.patch, pytz-2014.1.ebuild: + Specify the correct encoding when opening README.txt in setup.py, bug 504778 + by Alex Turbov.
+*pytz-2014.1.1 (22 Mar 2014) + + 22 Mar 2014; Mike Gilbert <floppym@gentoo.org> +pytz-2014.1.1.ebuild, + -files/pytz-2014.1-setup.py.patch, -pytz-2014.1.ebuild: + Upstream already fixed the encoding issue by removing the non-ASCII character + from README.txt. Bug 504778.
(In reply to Mike Gilbert from comment #9) > (In reply to Joshua Kinard from comment #7) > > It's not an error. I looked at it in a hex editor, and it's three bytes, > > 0xe2 0x80 0x94, which is definitely UTF-16 encoding. > > It's actually UTF-8. ^_^ > > + 22 Mar 2014; Mike Gilbert <floppym@gentoo.org> > + +files/pytz-2014.1-setup.py.patch, pytz-2014.1.ebuild: > + Specify the correct encoding when opening README.txt in setup.py, bug > 504778 > + by Alex Turbov. Oh, I was going by the number of encoded buts when stating UTF-16, as U+2016 is greater than 0x0800 and less than 0xffff, which is 16-bits of codepoint. https://en.wikipedia.org/wiki/UTF-8#Description But now I see that there are clear differences between UTF-8, UTF-16, and even UTF-32. Oy...