When attempting to access the xml for bug #114628 via http://bugs.gentoo.org/show_bug.cgi?ctype=xml&id=114628 my xml parser gives the following error: 'utf8' codec can't decode byte 0xbf The byte it can't decode is the upside down '?' in http://bugs.gentoo.org/114628#c5 The 3 xml validators that I tried all complain about the character... http://www.w3.org/2001/03/webdata/xsv?docAddrs=http%3A%2F%2Fbugs.gentoo.org%2Fshow_bug.cgi%3Fctype%3Dxml%26amp%3Bid%3D114628&style=xsl# http://www.ltg.ed.ac.uk/~richard/xml-check.cgi?url=http%3A%2F%2Fbugs.gentoo.org%2Fshow_bug.cgi%3Fctype%3Dxml%26amp%3Bid%3D114628 http://feedvalidator.org/check.cgi?url=http%3A%2F%2Fbugs.gentoo.org%2Fshow_bug.cgi%3Fctype%3Dxml%26amp%3Bid%3D114628
I've also run into this problem while looking at certain bugs with my pybugz script. If you look at bug 122500: http://bugs.gentoo.org/show_bug.cgi?id=122500&ctype=xml and try to parse it in elementtree using: from elementtree import ElementTree from urllib2 import urlopen ElementTree.parse(urlopen("http://bugs.gentoo.org/show_bug.cgi?id=122500&ctype=xml")) You'll get an error on line 2108, which is where some accented characters are. The solution to this is to change the bugzilla XML template to include the character encoding in the <?xml?> declaration in: ./template/en/default/bug/show.xml.tmpl from: <?xml version="1.0" standalone="yes"?> to: <?xml version="1.0" encoding="ISO-8859-1" ?>
Moving open bugzilla bugs to the new bugzilla group (because I'm about to stab lots of these bugs).
fixed with new version (your validators all pass it)