There is a common problem with Turkish "i" variations. The problem description can be found in http://www.i18nguy.com/unicode/turkish-i18n.html The following thread, and especially the given comment show how to use Turkish rules correctly by calling "setlocale" thread : http://bugs.python.org/issue1528802 message : http://bugs.python.org/msg55347 In my system, the given setlocale method does not fix the issue. I have written a simple C program to test that glibc does not have problems with Turkish capitalization. And a few lines of python code is used to check in python. C example works but python example does not. They should produce a capital "I" with a dot above it, under a "tr_TR.utf8" locale. The output is probably viewable under any utf8 locale. I believe that the problem is within python configuration/compile options. Any pointers for further investigation ? C code: #include <ctype.h> #include <stdio.h> #include <locale.h> #include <wctype.h> int main() { setlocale(LC_ALL, "tr_TR.utf8"); printf("%lc\n", towupper('i')); return 0;} Python code: import locale locale.setlocale(locale.LC_ALL,"tr_TR.utf8") print u"i".upper() Reproducible: Always Steps to Reproduce:
I really fail to see what are you expecting from us when this bug was already marked as invalid upstream, namely see http://bugs.python.org/msg55478.
Hi, The bug was marked upstream because after setlocale is called, conversions are made correctly. On gentoo, even though I explicitly set locale, it does not convert the characters correctly.
Does patch in bug #250075 which is now in Portage fix the issues?
(In reply to comment #3) > Does patch in bug #250075 which is now in Portage fix the issues? > I have installed python-2.5.2-r8 which incorporates the patch in bug #250075, and the problem persists. Still the 'i' is capitalised as "dotless capital I" and not the correct "capital I with a dot above". The mentioned patch seems to fix problems with identifier names, the problem here is with plain unicode strings. Can this be a problem in python<->glibc interface ?
Hello, Does this problem is still present with new stable dev-lang/python version?? Best regards,
Yes, with python-2.6.2-r1 the bug is still there >>> import locale >>> locale.setlocale(locale.LC_ALL,"tr_TR.utf8") 'tr_TR.utf8' >>> repr(u"i".upper()) "u'I'" Where it must be "capital I with dot above" unicode character. >>> repr(u"İ") "u'\\u0130'" The C version still works correctly and displays "capital I with dot above". From what I understand from python bug *, the python version should do the same. I tried this on an ubuntu machine, it had the same error. I also tried on Pardus **, a distribution from Turkey, and python 2.6.2 interpreter there works correctly with the test above, repr(u"i".upper()) returns ---> u'\\u0130' *: http://bugs.python.org/issue1528802 **: http://pardus.org.tr/eng/ Pardus specific patches at *** seem to contain fixes for i-I problem in _identifier names only_ as the unicode string operations should work well within the correct locale. (That is what C library does, as shown in the C version, and python is said to call underlying C library functions) ***: http://packages.pardus.org.tr/info/2009/devel/source/python.html My wild guess is this is a bug in C library - python interface, but I don't have a practical way to test/debug this guess.
Resolving as UPSTREAM.