Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 204221 - dev-lang/python breaks Turkish capitalization rules
Summary: dev-lang/python breaks Turkish capitalization rules
Status: RESOLVED UPSTREAM
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: High normal (vote)
Assignee: Python Gentoo Team
URL:
Whiteboard:
Keywords:
Depends on: 250075
Blocks:
  Show dependency tree
 
Reported: 2008-01-03 23:12 UTC by Gokdeniz Karadag
Modified: 2012-02-16 13:00 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gokdeniz Karadag 2008-01-03 23:12:38 UTC
There is a common problem with Turkish "i" variations.
The problem description can be found in 
http://www.i18nguy.com/unicode/turkish-i18n.html

The following thread, and especially the given comment show how to use Turkish rules correctly by calling "setlocale"
thread  : http://bugs.python.org/issue1528802
message : http://bugs.python.org/msg55347

In my system, the given setlocale method does not fix the issue. 
I have written a simple C program to test that glibc does not have problems with Turkish capitalization. And a few lines of python code is used to check in python. C example works but python example does not. They should produce a capital "I" with a dot above it, under a "tr_TR.utf8" locale. The output is probably viewable under any utf8 locale.

I believe that the problem is within python configuration/compile options. Any pointers for further investigation ?


C code:

#include <ctype.h>
#include <stdio.h>
#include <locale.h>
#include <wctype.h>
int main() {
  setlocale(LC_ALL, "tr_TR.utf8");
  printf("%lc\n", towupper('i'));
return 0;}

Python code: 

import locale
locale.setlocale(locale.LC_ALL,"tr_TR.utf8")
print u"i".upper()

Reproducible: Always

Steps to Reproduce:
Comment 1 Jakub Moc (RETIRED) gentoo-dev 2008-01-03 23:35:09 UTC
I really fail to see what are you expecting from us when this bug was already marked as invalid upstream, namely see http://bugs.python.org/msg55478.
Comment 2 Gokdeniz Karadag 2008-01-04 11:48:19 UTC
Hi,

The bug was marked upstream because after setlocale is called, conversions are made correctly. 

On gentoo, even though I explicitly set locale, it does not convert the characters  correctly. 

Comment 3 Serkan Kaba (RETIRED) gentoo-dev 2009-01-29 08:47:22 UTC
Does patch in bug #250075 which is now in Portage fix the issues?
Comment 4 Gokdeniz Karadag 2009-01-29 10:06:46 UTC
(In reply to comment #3)
> Does patch in bug #250075 which is now in Portage fix the issues?
> 

I have installed python-2.5.2-r8 which incorporates the patch in bug #250075, and the problem persists. Still the 'i' is capitalised as "dotless capital I" and not the correct "capital I with a dot above".

The mentioned patch seems to fix problems with identifier names, the problem here is with plain unicode strings. Can this be a problem in python<->glibc interface ?

Comment 5 Jesus Rivero (RETIRED) gentoo-dev 2009-09-20 18:49:15 UTC
Hello, 

  Does this problem is still present with new stable dev-lang/python version??

  Best regards,
Comment 6 Gokdeniz Karadag 2009-09-20 20:54:58 UTC
Yes, with python-2.6.2-r1 the bug is still there 

>>> import locale
>>> locale.setlocale(locale.LC_ALL,"tr_TR.utf8")
'tr_TR.utf8'

>>> repr(u"i".upper())
"u'I'"

Where it must be "capital I with dot above" unicode character.
>>> repr(u"İ")
"u'\\u0130'"

The C version still works correctly and displays "capital I with dot above". From what I understand from python bug *, the python version should do the same.
I tried this on an ubuntu machine, it had the same error. I also tried on Pardus **, a distribution from Turkey, and python 2.6.2 interpreter there works correctly with the test above,
repr(u"i".upper())  returns ---> u'\\u0130'

*: http://bugs.python.org/issue1528802
**: http://pardus.org.tr/eng/

Pardus specific patches at *** seem to contain fixes for i-I problem in _identifier names only_ as the unicode string operations should work well within the correct locale. (That is what C library does, as shown in the C version, and python is said to call underlying C library functions)

***: http://packages.pardus.org.tr/info/2009/devel/source/python.html

My wild guess is this is a bug in C library - python interface, but I don't have a practical way to test/debug this guess.
Comment 7 Dirkjan Ochtman (RETIRED) gentoo-dev 2012-02-16 13:00:12 UTC
Resolving as UPSTREAM.