Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 170901

Summary: Uppercase/lowecase don't work correctly with locale it_IT.utf8
Product: Gentoo Linux Reporter: Daniele Varrazzo <daniele.varrazzo>
Component: [OLD] Core systemAssignee: Gentoo Toolchain Maintainers <toolchain>
Status: RESOLVED NEEDINFO    
Severity: normal    
Priority: High    
Version: unspecified   
Hardware: x86   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---
Attachments: An utf-8 encoded file containing a query that should be return "t"

Description Daniele Varrazzo 2007-03-14 16:56:24 UTC
All the functions related to uppercase/lowercase classification with locale it_IT.utf8 fail. 

A pair of examples:

1) with PostgreSQL

$ initdb --encoding=utf8 --locale=it_IT.utf8 data
$ pg_ctl -D data start
$ psql postgres
postgres=# SELECT 'è' ILIKE 'È'; -- should be t
 ?column?
----------
 f
(1 row)

2) with Python:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'it_IT') # this works
'it_IT'
>>> print 'è'.upper()
È
>>> locale.setlocale(locale.LC_ALL, 'it_IT.utf8') # this fails
'it_IT.utf8'
>>> print 'è'.upper() # should be È
è


Reproducible: Always




Portage 2.1.2.2 (default-linux/x86/2006.1/desktop, gcc-3.4.6, glibc-2.5-r0, 2.6.19-gentoo-r5 i686)
=================================================================
System uname: 2.6.19-gentoo-r5 i686 Intel(R) Pentium(R) M processor 1.60GHz
Gentoo Base System release 1.12.9
Timestamp of tree: Sun, 11 Mar 2007 11:50:01 +0000
dev-java/java-config: 1.3.7, 2.0.31
dev-lang/python:     2.4.3-r4
dev-python/pycrypto: 2.0.1-r5
sys-apps/sandbox:    1.2.17
sys-devel/autoconf:  2.13, 2.61
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10
sys-devel/binutils:  2.16.1-r3
sys-devel/gcc-config: 1.3.14
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.17-r2
ACCEPT_KEYWORDS="x86"
AUTOCLEAN="yes"
CBUILD="i686-pc-linux-gnu"
CFLAGS="-O2 -march=i686 -pipe"
CHOST="i686-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/share/X11/xkb /usr/share/config"
CONFIG_PROTECT_MASK="/etc/env.d /etc/env.d/java/ /etc/gconf /etc/java-config/vms/ /etc/php/apache1-php5/ext-active/ /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c"
CXXFLAGS="-O2 -march=i686 -pipe"
DISTDIR="/usr/portage/distfiles"
FEATURES="autoconfig distlocks metadata-transfer sandbox sfperms strict"
GENTOO_MIRRORS="http://distfiles.gentoo.org http://distro.ibiblio.org/pub/linux/distributions/gentoo"
LINGUAS="it en"
MAKEOPTS="-j2"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/local/portage /usr/portage/local/layman/portage-xgl /usr/portage/local/layman/voip /usr/portage/local/layman/xeffects"
SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage"
USE="X aiglx alsa apache2 arts bash-completion berkdb bitmap-fonts blas bluetooth bzip2 cairo cdparanoia cdr cli cracklib crypt cups dbus dri dvd dvdr emboss encode esd ethereal exif fam fastcgi fbcon firefox fortran gdbm gif glut gpm gstreamer hal iconv ipv6 isdnlog jpeg kde ldap libg++ mad midi mikmod mmx mp3 mpeg musicbrainz ncurses nls nowlistening nptl nptlonly ogg opengl oss pam pcmcia pcre pdf perl png postgres povray ppds pppd python qt qt3 qt4 quicktime readline readlines real reflection samba scanner sdl session spell spl sse sse2 ssl subversion svg tcpd tiff truetype truetype-fonts type1-fonts unicode usb vorbis win32codecs x86 xine xml xorg xprint xv zlib" ALSA_CARDS="ali5451 als4000 atiixp atiixp-modem bt87x ca0106 cmipci emu10k1 emu10k1x ens1370 ens1371 es1938 es1968 fm801 hda-intel intel8x0 intel8x0m maestro3 trident usb-audio via82xx via82xx-modem ymfpci" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="keyboard mouse evdev synaptics" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="it en" USERLAND="GNU" VIDEO_CARDS="radeon fglrx vesa"
Unset:  CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 1 Daniele Varrazzo 2007-03-14 17:25:36 UTC
Created attachment 113265 [details]
An utf-8 encoded file containing a query that should be return "t"
Comment 2 Daniele Varrazzo 2007-03-14 17:28:37 UTC
Aorry, the Python example is bogus. The postgres one should not be anyway: it seems a lc_ctype problem.

I attached a file showing the bug. If the database is created with --encoding=utf8 --locale=it_IT.utf8, the test fails:

$ psql postgres < test.utf8
 ?column?
----------
 f
(1 row)

The test passes if the database is created with --encoding=latin1 --locale=it_IT

iconv -f utf8 -t latin1 < test | psql postgres
 ?column?
----------
 t
(1 row)
Comment 3 SpanKY gentoo-dev 2007-03-16 23:05:02 UTC
i would guess that your terminal is causing this inconsistency ...

your terminal needs to be set up for both UTF8 input/output in order for this test to be valid ... by changing in a shell on the fly via `export LC_ALL`, you would get weird behavior in pretty much all terminals
Comment 4 Daniele Varrazzo 2007-03-17 16:58:22 UTC
(In reply to comment #3)
> i would guess that your terminal is causing this inconsistency ...
> 
> your terminal needs to be set up for both UTF8 input/output in order for this
> test to be valid ... by changing in a shell on the fly via `export LC_ALL`, you
> would get weird behavior in pretty much all terminals

I've been careful to not be fooled by the console encoding. Anyway i performed other tests and it seems a problem limited to PostgreSQL and not to the C libaries. I verified the problem with the en_US.utf8 locale too.

I reported the bug to the PostgreSQL team with the following test, which is entirely ascii (the query is supposed to return 't').

$ initdb --encoding=utf8 --locale=en_US.utf8 en_utf8
$ pg_ctl -D en_utf8 start
$ psql postgres
postgres=# SELECT upper('\xc3\xa8') ILIKE '\xc3\xa8';
 ?column?
----------
 f