162493 – app-portage/gentoolkit - glsa-check failling to handle unicode character

Bug 162493 - app-portage/gentoolkit - glsa-check failling to handle unicode character

Summary: app-portage/gentoolkit - glsa-check failling to handle unicode character

Status:	RESOLVED FIXED

Alias:	None

Product:	Portage Development
Classification:	Unclassified
Component:	Tools (show other bugs)
Hardware:	All Linux

Importance:	High minor with 1 vote (vote)
Assignee:	Portage Tools Team

URL:
Whiteboard:
Keywords:	InVCS

Duplicates (2):	162532 194404 (view as bug list)
Depends on:
Blocks:	170220 172955 181170 186549 194356
	Show dependency tree

Reported:	2007-01-17 06:48 UTC by Chris Gottbrath
Modified:	2008-02-21 01:51 UTC (History)
CC List:	20 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chris Gottbrath 2007-01-17 06:48:22 UTC

When I run any glsa-check query that reads 200701-12 it chokes on 
the unicode in someone's name. 


Reproducible: Always

Steps to Reproduce:
1. glsa-check -l 200701-12

Actual Results:  
% glsa-check -l 200701-12                                                            chrisg@AmonDin:/home/chrisg
[A] means this GLSA was already applied,
[U] means the system is not affected and
[N] indicates that the system might be affected.

Traceback (most recent call last):
  File "/usr/bin/glsa-check", line 205, in ?
    sys.exit(summarylist(glsalist))
  File "/usr/bin/glsa-check", line 171, in summarylist
    myglsa = Glsa(myid, glsaconfig)
  File "/usr/lib/gentoolkit/pym/glsa.py", line 414, in __init__
    self.read()
  File "/usr/lib/gentoolkit/pym/glsa.py", line 432, in read
    self.parse(urllib.urlopen(myurl))
  File "/usr/lib/gentoolkit/pym/glsa.py", line 470, in parse
    self.description = getText(myroot.getElementsByTagName("description")[0], format="xml")
  File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText
    return str(rValue)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 8: ordinal not in range(128)


Expected Results:  
glsa-check should handle unicode properly.

In my environment the following 'quick fix' avoids the crash but replaces the two characters with question marks. I suspect this is not the 'correct' fix. 

% diff -c /usr/lib/gentoolkit/pym/glsa.py /home/chrisg/temp/                   chrisg@AmonDin:/home/chrisg/temp
*** /usr/lib/gentoolkit/pym/glsa.py     Wed Jan 17 01:13:07 2007
--- /home/chrisg/temp/glsa.py   Wed Jan 17 02:25:48 2007
***************
*** 230,235 ****
--- 230,236 ----
        if format == "strip":
                rValue = rValue.strip(" \n\t")
                rValue = re.sub("[\s]{2,}", " ", rValue)
+       rValue=rValue.encode('ascii','replace')  # fix to handle unicode input
        return str(rValue)

  def getMultiTagsText(rootnode, tagname, format):

Comment 1 Jakub Moc (RETIRED) gentoo-dev

2007-01-17 13:24:12 UTC

*** Bug 162532 has been marked as a duplicate of this bug. ***

Comment 2 Jakub Moc (RETIRED) gentoo-dev

2007-01-17 13:49:22 UTC

just emerge --sync; we've replaced the offending characters in that GLSA meanwhile...

Comment 3 Richard Benjamin Voigt 2007-04-01 03:01:16 UTC

Another one....  I think Chris's solution would be quite reasonable.

glsa-check -l 200703-26
[A] means this GLSA was already applied,
[U] means the system is not affected and
[N] indicates that the system might be affected.

Traceback (most recent call last):
  File "/usr/bin/glsa-check", line 212, in ?
    sys.exit(summarylist(glsalist))
  File "/usr/bin/glsa-check", line 172, in summarylist
    myglsa = Glsa(myid, glsaconfig)
  File "/usr/lib/gentoolkit/pym/glsa.py", line 414, in __init__
    self.read()
  File "/usr/lib/gentoolkit/pym/glsa.py", line 432, in read
    self.parse(urllib.urlopen(myurl))
  File "/usr/lib/gentoolkit/pym/glsa.py", line 470, in parse
    self.description = getText(myroot.getElementsByTagName("description")[0], format="xml")
  File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText
    return str(rValue)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)

Comment 4 vicaya 2007-04-01 07:56:44 UTC

A better fix would be changing line 233 to:

return str(rValue.encode('utf-8'));

glsa*.xml is already encoded in utf-8, reencode to ascii (python's default encoding) would bound to have problems.

Comment 5 Marius Mauch (RETIRED) gentoo-dev

2007-05-30 18:17:07 UTC

Did anyone actually test those changes with a GLSA containing non-Ascii characters (with all glsa-check operations)? I have to admit that I'm pretty ignorant when it comes to Unicode issues, so I'm not exactly the most qualified person to test this.
Two things one should be aware of here:
1) the current conversion mainly exists to ensure that we only pass Ascii strings into portage as portage does a few type checks that would fail with Unicode strings resulting in even nastier error messages.
2) in recent versions glsa-check got a new --mail option, if glsa.py would return strings containing non-ascii characters one would have to make sure that we set the correct MIME type for mails.

Comment 6 Rob M. 2007-06-07 05:40:28 UTC

I have the exact same bug, same traceback and everything. Just synced a hour ago.

just tried the fix provided, works fine. This should probably be committed if there is some more testing.

Comment 7 Sebastian Siewior 2007-06-07 10:14:45 UTC

Just synced, same problem with glsa-200706-02.xml.
Fix from comment #1 did not work, fix #4 works fine. Please apply this glsa

Comment 8 Toby Murray 2007-07-25 13:47:52 UTC

And another one this morning. This seems like such a simple problem to fix permanently...

Comment 9 Calum 2007-07-26 17:12:15 UTC

I'm suffering too on several boxes.

  File "/usr/lib/gentoolkit/pym/glsa.py", line 233, in getText
    return str(rValue)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 34: ordinal not in range(128)

Is there something I've not done - like emerge python with the right unicode support? If so, I'll do it.

Comment 10 Dmitry Karasik 2007-10-01 03:03:17 UTC

Same problem with glsa-200709-18.xml.

Traceback (most recent call last):
  File "/usr/bin/glsa-check", line 168, in <module>
    myglsa = Glsa(x, glsaconfig)
  File "/usr/lib/gentoolkit/pym/glsa.py", line 441, in __init__
    self.read()
  File "/usr/lib/gentoolkit/pym/glsa.py", line 459, in read
    self.parse(urllib.urlopen(myurl))
  File "/usr/lib/gentoolkit/pym/glsa.py", line 497, in parse
    self.description = getText(myroot.getElementsByTagName("description")[0], format="xml")
  File "/usr/lib/gentoolkit/pym/glsa.py", line 242, in getText
    return str(rValue)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 544: ordinal not in range(128)

Comment 11 Jakub Moc (RETIRED) gentoo-dev

2007-10-01 18:18:17 UTC

*** Bug 194404 has been marked as a duplicate of this bug. ***

Comment 12 Toby Murray 2007-10-01 22:01:23 UTC

Is this ever going to get fixed or are we just going to continue to come back here every 2-3 months and complain about it?

Comment 13 Calum 2007-10-02 08:34:29 UTC

Agreed. As someone who only updates packages based on the cron output of glsa-check -l affected, this is quite important to me.
If it stops working, I don't get any notice of security alerts.
They already stopped issuing kernel GLSAs (for some reason I can't fathom) - but this needs to be solid.
Not all of us have time to read all the bugzilla security entries.

Comment 14 Calum 2007-10-02 08:37:06 UTC

Who's going to change the severity then? :)

Comment 15 Gerben Vos 2007-10-02 11:28:49 UTC

Added a few things to my proposed fix at bug 194404 .

Comment 16 hexa 2007-10-02 14:07:23 UTC

All my security scripts fail every time we have some strange glsa entry and i fail to get report by mail. Leaving my servers vulnerable while thinking i did my work.

PLEASE some1 fix this and change severity.

Comment 17 Andrea 2007-10-02 16:56:58 UTC

Same problem here with gentoolkit-0.2.3-r1

Fixed changing line 233 in "gentoolkit/pym/glsa.py"
from:
return str(rValue)
to:
return rValue.encode('utf-8')

Comment 18 Peter Bichler 2007-10-08 10:33:52 UTC

This works fine for me.
Many thanks


(In reply to comment #17)
> Same problem here with gentoolkit-0.2.3-r1
> 
> Fixed changing line 233 in "gentoolkit/pym/glsa.py"
> from:
> return str(rValue)
> to:
> return rValue.encode('utf-8')
>

Comment 19 hexa 2007-10-08 10:39:38 UTC

(In reply to comment #18)
> This works fine for me.
> Many thanks
> 
> 
It will untill you emerge gentoolkit again :-)

Comment 20 Paul Varner (RETIRED) gentoo-dev

2008-02-21 01:51:41 UTC

Released in gentoolkit-0.2.4_rc2

caluml
chris.burroughs
cory
creideiki+gentoo-bugzilla
david
dkarasik
gb_about_gnu
genbug
gentoo-bugs
gpvos+gnt
lcars
martin
moixa
peter_bichler
ps
rajiv
remy
sascha-gentoo-bugzilla
sven.koehler
tv