Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 87985 - incorrect statement in "Using UTF-8 with Gentoo" guide
Summary: incorrect statement in "Using UTF-8 with Gentoo" guide
Status: RESOLVED FIXED
Alias: None
Product: [OLD] Docs-user
Classification: Unclassified
Component: Desktop Configuration Guide (show other bugs)
Hardware: All Linux
: High minor (vote)
Assignee: Xavier Neys (RETIRED)
URL: http://www.gentoo.org/doc/en/utf-8.xml
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-04-04 18:06 UTC by gna
Modified: 2005-04-05 04:27 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description gna 2005-04-04 18:06:15 UTC
The "Using UTF-8 with Gentoo" guide states: Unicode throws away the traditional single-byte limit of character sets, and even with two bytes per-character this allows a maximum 65,536 characters.

Many people assume unicode only allows 65,536 characters but this is definitely incorrect. Current versions of unicode allow a maximum of 1,114,112 characters. See http://www.tbray.org/ongoing/When/200x/2003/04/26/UTF for a good explanation of supplementary planes.

Most of this confusion arises because early versions of unicode did allow for only 65,536 codepoints. But this changed in version 3.2 (not sure of the exact version).

The above link might be a good one to add to the list of resources at the bottom of the guide.

Reproducible: Always
Steps to Reproduce:
1. Read the UTF8 guide
2.
3.
Comment 1 gna 2005-04-04 18:17:42 UTC
Actually looking at this again and the following sentence, it is quite unclear whether you are trying to say that unicode only allows 65,536 characters or not. Certainly it needs to be rewritten. Actually 65,536 codepoints would probably be enough if it weren't for Chinese. There are already more than 65,000 Chinese characters in unicode and there are plans to add another 40,000! 
Comment 2 Xavier Neys (RETIRED) gentoo-dev 2005-04-05 02:00:31 UTC
You're right.
I rephrased the paragraph and add the link to the 'Char vs bytes' article.

Thanks for reporting.
Comment 3 gna 2005-04-05 04:27:48 UTC
I think you have rewritten the relevant section very clearly and concisely.