When I run any glsa-check query that reads 200701-12 it chokes on the unicode in someone's name. Reproducible: Always Steps to Reproduce: 1. glsa-check -l 200701-12 Actual Results: % glsa-check -l 200701-12 chrisg@AmonDin:/home/chrisg [A] means this GLSA was already applied, [U] means the system is not affected and [N] indicates that the system might be affected. Traceback (most recent call last): File "/usr/bin/glsa-check", line 205, in ? sys.exit(summarylist(glsalist)) File "/usr/bin/glsa-check", line 171, in summarylist myglsa = Glsa(myid, glsaconfig) File "/usr/lib/gentoolkit/pym/glsa.py", line 414, in __init__ self.read() File "/usr/lib/gentoolkit/pym/glsa.py", line 432, in read self.parse(urllib.urlopen(myurl)) File "/usr/lib/gentoolkit/pym/glsa.py", line 470, in parse self.description = getText(myroot.getElementsByTagName("description")[0], format="xml") File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText return str(rValue) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128) Expected Results: glsa-check should handle unicode properly. In my environment the following 'quick fix' avoids the crash but replaces the two characters with question marks. I suspect this is not the 'correct' fix. % diff -c /usr/lib/gentoolkit/pym/glsa.py /home/chrisg/temp/ chrisg@AmonDin:/home/chrisg/temp *** /usr/lib/gentoolkit/pym/glsa.py Wed Jan 17 01:13:07 2007 --- /home/chrisg/temp/glsa.py Wed Jan 17 02:25:48 2007 *************** *** 230,235 **** --- 230,236 ---- if format == "strip": rValue = rValue.strip(" \n\t") rValue = re.sub("[\s]{2,}", " ", rValue) + rValue=rValue.encode('ascii','replace') # fix to handle unicode input return str(rValue) def getMultiTagsText(rootnode, tagname, format):
*** Bug 162532 has been marked as a duplicate of this bug. ***
just emerge --sync; we've replaced the offending characters in that GLSA meanwhile...
Another one.... I think Chris's solution would be quite reasonable. glsa-check -l 200703-26 [A] means this GLSA was already applied, [U] means the system is not affected and [N] indicates that the system might be affected. Traceback (most recent call last): File "/usr/bin/glsa-check", line 212, in ? sys.exit(summarylist(glsalist)) File "/usr/bin/glsa-check", line 172, in summarylist myglsa = Glsa(myid, glsaconfig) File "/usr/lib/gentoolkit/pym/glsa.py", line 414, in __init__ self.read() File "/usr/lib/gentoolkit/pym/glsa.py", line 432, in read self.parse(urllib.urlopen(myurl)) File "/usr/lib/gentoolkit/pym/glsa.py", line 470, in parse self.description = getText(myroot.getElementsByTagName("description")[0], format="xml") File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText return str(rValue) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)
A better fix would be changing line 233 to: return str(rValue.encode('utf-8')); glsa*.xml is already encoded in utf-8, reencode to ascii (python's default encoding) would bound to have problems.
Did anyone actually test those changes with a GLSA containing non-Ascii characters (with all glsa-check operations)? I have to admit that I'm pretty ignorant when it comes to Unicode issues, so I'm not exactly the most qualified person to test this. Two things one should be aware of here: 1) the current conversion mainly exists to ensure that we only pass Ascii strings into portage as portage does a few type checks that would fail with Unicode strings resulting in even nastier error messages. 2) in recent versions glsa-check got a new --mail option, if glsa.py would return strings containing non-ascii characters one would have to make sure that we set the correct MIME type for mails.
I have the exact same bug, same traceback and everything. Just synced a hour ago. just tried the fix provided, works fine. This should probably be committed if there is some more testing.
Just synced, same problem with glsa-200706-02.xml. Fix from comment #1 did not work, fix #4 works fine. Please apply this glsa
And another one this morning. This seems like such a simple problem to fix permanently...
I'm suffering too on several boxes. File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText return str(rValue) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 34: ordinal not in range(128) Is there something I've not done - like emerge python with the right unicode support? If so, I'll do it.
Same problem with glsa-200709-18.xml. Traceback (most recent call last): File "/usr/bin/glsa-check", line 168, in <module> myglsa = Glsa(x, glsaconfig) File "/usr/lib/gentoolkit/pym/glsa.py", line 441, in __init__ self.read() File "/usr/lib/gentoolkit/pym/glsa.py", line 459, in read self.parse(urllib.urlopen(myurl)) File "/usr/lib/gentoolkit/pym/glsa.py", line 497, in parse self.description = getText(myroot.getElementsByTagName("description")[0], format="xml") File "/usr/lib/gentoolkit/pym/glsa.py", line 242, in getText return str(rValue) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 544: ordinal not in range(128)
*** Bug 194404 has been marked as a duplicate of this bug. ***
Is this ever going to get fixed or are we just going to continue to come back here every 2-3 months and complain about it?
Agreed. As someone who only updates packages based on the cron output of glsa-check -l affected, this is quite important to me. If it stops working, I don't get any notice of security alerts. They already stopped issuing kernel GLSAs (for some reason I can't fathom) - but this needs to be solid. Not all of us have time to read all the bugzilla security entries.
Who's going to change the severity then? :)
Added a few things to my proposed fix at bug 194404 .
All my security scripts fail every time we have some strange glsa entry and i fail to get report by mail. Leaving my servers vulnerable while thinking i did my work. PLEASE some1 fix this and change severity.
Same problem here with gentoolkit-0.2.3-r1 Fixed changing line 233 in "gentoolkit/pym/glsa.py" from: return str(rValue) to: return rValue.encode('utf-8')
This works fine for me. Many thanks (In reply to comment #17) > Same problem here with gentoolkit-0.2.3-r1 > > Fixed changing line 233 in "gentoolkit/pym/glsa.py" > from: > return str(rValue) > to: > return rValue.encode('utf-8') >
(In reply to comment #18) > This works fine for me. > Many thanks > > It will untill you emerge gentoolkit again :-)
Released in gentoolkit-0.2.4_rc2