Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 133287

Summary: utf8 changes in flat_hash
Product: Portage Development Reporter: Brian Harring (RETIRED) <ferringb>
Component: Core - Ebuild SupportAssignee: Portage team <dev-portage>
Status: RESOLVED FIXED    
Severity: normal Keywords: InVCS, REGRESSION
Priority: High    
Version: 2.1   
Hardware: All   
OS: Linux   
URL: http://glep.gentoo.org/glep-0031.html
Whiteboard:
Package list:
Runtime testing required: ---
Bug Depends on:    
Bug Blocks: 115839    

Description Brian Harring (RETIRED) gentoo-dev 2006-05-14 04:04:53 UTC
Flip through the glep- global vars (ie, metadata) must be ascii 0-127, forcing flat_hash to encode everything to disk as utf8 thus is seriously invalid- the attempted encoding should just plain be removed.

Reasons are pretty straightforward- encoding isn't something you set just for write, you need to maintain it end to end- both writing *and* reading.  This means literally that the initial load of the metadata from a regen needs to be read in as unicode (it's not being done so), else you'll get chars 0xf set (for multi byte unicode glyphs)- if you read it in as ascii, you cannot easily convert it to unicode at the marshalling stage as you're trying.

Beyond that, the change doesn't trap encoding failures (games-sports/miniracer's (R) glyph being easy one to trigger).

Doing unicode is a good thing, but it's not a simple change- ie, not something for 2.1 imo, regardless, glep31 already lays out that the metadata keys *must* be ascii 0-127 (thus ruling out utf8), so the flat_hash change in rev 3328 has to be backed out.
Comment 1 Zac Medico gentoo-dev 2006-05-14 04:42:08 UTC
I've reverted it in r3349.  We'll have to add a check to repoman to make sure that all metadata is plain ascii...
Comment 2 Zac Medico gentoo-dev 2006-05-14 05:03:44 UTC
released in 2.1_rc1-r1