Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 170901 - Uppercase/lowecase don't work correctly with locale it_IT.utf8
Summary: Uppercase/lowecase don't work correctly with locale it_IT.utf8
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: x86 Linux
: High normal (vote)
Assignee: Gentoo Toolchain Maintainers
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-14 16:56 UTC by Daniele Varrazzo
Modified: 2007-03-17 16:58 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments
An utf-8 encoded file containing a query that should be return "t" (test.utf8,24 bytes, text/plain)
2007-03-14 17:25 UTC, Daniele Varrazzo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Daniele Varrazzo 2007-03-14 16:56:24 UTC
All the functions related to uppercase/lowercase classification with locale it_IT.utf8 fail. 

A pair of examples:

1) with PostgreSQL

$ initdb --encoding=utf8 --locale=it_IT.utf8 data
$ pg_ctl -D data start
$ psql postgres
postgres=# SELECT 'è' ILIKE 'È'; -- should be t
 ?column?
----------
 f
(1 row)

2) with Python:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'it_IT') # this works
'it_IT'
>>> print 'è'.upper()
È
>>> locale.setlocale(locale.LC_ALL, 'it_IT.utf8') # this fails
'it_IT.utf8'
>>> print 'è'.upper() # should be È
è


Reproducible: Always




Portage 2.1.2.2 (default-linux/x86/2006.1/desktop, gcc-3.4.6, glibc-2.5-r0, 2.6.19-gentoo-r5 i686)
=================================================================
System uname: 2.6.19-gentoo-r5 i686 Intel(R) Pentium(R) M processor 1.60GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Sun, 11 Mar 2007 11:50:01 +0000
dev-java/java-config: 1.3.7, 2.0.31
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/php/apache1-php5/ext-active/ /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -march=i686 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LINGUAS="it en"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/portage-xgl /usr/portage/local/layman/voip /usr/portage/local/layman/xeffects"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X aiglx alsa apache2 arts bash-completion berkdb bitmap-fonts blas bluetooth bzip2 cairo cdparanoia cdr cli cracklib crypt cups dbus dri dvd dvdr emboss encode esd ethereal exif fam fastcgi fbcon firefox fortran gdbm gif glut gpm gstreamer hal iconv ipv6 isdnlog jpeg kde ldap libg++ mad midi mikmod mmx mp3 mpeg musicbrainz ncurses nls nowlistening nptl nptlonly ogg opengl oss pam pcmcia pcre pdf perl png postgres povray ppds pppd python qt qt3 qt4 quicktime readline readlines real reflection samba scanner sdl session spell spl sse sse2 ssl subversion svg tcpd tiff truetype truetype-fonts type1-fonts unicode usb vorbis win32codecs x86 xine xml xorg xprint xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="it en" USERLAND="GNU" VIDEO_CARDS="radeon fglrx vesa"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Daniele Varrazzo 2007-03-14 17:25:36 UTC
Created attachment 113265 [details]
An utf-8 encoded file containing a query that should be return "t"
Comment 2 Daniele Varrazzo 2007-03-14 17:28:37 UTC
Aorry, the Python example is bogus. The postgres one should not be anyway: it seems a lc_ctype problem.

I attached a file showing the bug. If the database is created with --encoding=utf8 --locale=it_IT.utf8, the test fails:

$ psql postgres < test.utf8
 ?column?
----------
 f
(1 row)

The test passes if the database is created with --encoding=latin1 --locale=it_IT

iconv -f utf8 -t latin1 < test | psql postgres
 ?column?
----------
 t
(1 row)
Comment 3 SpanKY gentoo-dev 2007-03-16 23:05:02 UTC
i would guess that your terminal is causing this inconsistency ...

your terminal needs to be set up for both UTF8 input/output in order for this test to be valid ... by changing in a shell on the fly via `export LC_ALL`, you would get weird behavior in pretty much all terminals
Comment 4 Daniele Varrazzo 2007-03-17 16:58:22 UTC
(In reply to comment #3)
> i would guess that your terminal is causing this inconsistency ...
> 
> your terminal needs to be set up for both UTF8 input/output in order for this
> test to be valid ... by changing in a shell on the fly via `export LC_ALL`, you
> would get weird behavior in pretty much all terminals

I've been careful to not be fooled by the console encoding. Anyway i performed other tests and it seems a problem limited to PostgreSQL and not to the C libaries. I verified the problem with the en_US.utf8 locale too.

I reported the bug to the PostgreSQL team with the following test, which is entirely ascii (the query is supposed to return 't').

$ initdb --encoding=utf8 --locale=en_US.utf8 en_utf8
$ pg_ctl -D en_utf8 start
$ psql postgres
postgres=# SELECT upper('\xc3\xa8') ILIKE '\xc3\xa8';
 ?column?
----------
 f