Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 161001 - dev-db/mysql-5.0.26-r2 + PHP = UTF8 Characters messed up
Summary: dev-db/mysql-5.0.26-r2 + PHP = UTF8 Characters messed up
Status: RESOLVED NEEDINFO
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Server (show other bugs)
Hardware: AMD64 Linux
: High major
Assignee: Gentoo Linux MySQL bugs team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-01-08 21:11 UTC by vad3R
Modified: 2007-05-20 22:06 UTC (History)
1 user (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description vad3R 2007-01-08 21:11:44 UTC
Hi,

i recently updated my minimal mysql installation on our productive webservers from 5.0.26-r1 5.0.26-r2. After finished the update the following problem occured:

Data from Database (which is inserted UTF-8 encoded) is displayed on our website just like in the DB (Ä,Ö,Ü,...)

After downgrading to r1 again (from binpkg because it disappeared from portage) everything works as expected.


Reproducible: Always

Steps to Reproduce:
1. Install mysql-5.0.26-r1
2. Visit website
3.

Actual Results:  
UTF-8 characters are messed up

Expected Results:  
Well encoded characters

dev-lang/php-5.1.6-r6 USE="apache2 berkdb crypt gdbm iconv ipv6 mysql ncurses nls pcre readline reflection session spl ssl unicode xml zlib (-adabas) -apache -bcmath (-birdstep) -bzip2 -calendar -cdb -cgi -cjk -cli -concurrentmodphp -ctype -curl -curlwrappers -db2 -dbase (-dbmaker) -debug -discard-path -doc (-empress) (-empress-bcs) (-esoob) -exif -fastbuild (-fdftk) (-filepro) -firebird -flatfile -force-cgi-redirect (-frontbase) -ftp -gd -gd-external -gmp -hardenedphp -hash -hyperwave-api -imap (-informix) -inifile -interbase -iodbc -java-external -kerberos -ldap -libedit -mcve -memlimit -mhash -ming -msql -mssql -mysqli -oci8 (-oci8-instant-client) -odbc -pcntl -pdo -pdo-external -pic -posix -postgres -qdbm -recode -sapdb -sasl -sharedext -sharedmem -simplexml -snmp -soap -sockets (-solid) -spell -sqlite (-sybase) (-sybase-ct) -sysvipc -threads -tidy -tokenizer -truetype -vm-goto -vm-switch -wddx -xmlreader -xmlrpc -xmlwriter -xpm -xsl -yaz -zip"

---------------------------------------------------

equery u =dev-db/mysql-5.0.26-r1
[ Searching for packages matching =dev-db/mysql-5.0.26-r1... ]
[ Colour Code : set unset ]
[ Legend        : Left column  (U) - USE flags from make.conf                     ]
[                  : Right column (I) - USE flags packages was installed with ]
[ Found these USE variables for dev-db/mysql-5.0.26-r1 ]
 U I
 + + berkdb      : Adds support for sys-libs/db (Berkeley DB for MySQL)
 - - big-tables  : Make tables contain up to 1.844E+19 rows
 - - cluster     : Add support for NDB clustering.
 - - debug       : Enable extra debug codepaths, like asserts and extra output. If you want to get meaningful backtraces see http://www.gentoo.org/proj/en/qa/backtraces.xml .
 - - embedded    : Build embedded server (libmysqld)
 - - extraengine : Add support for alternative storage engines.
 - + latin1      : Use LATIN1 encoding instead of UTF8.
 - - max-idx-128 : Raise the max index per table limit from 64 to 128
 + - minimal     : Install a very minimal build (disables, for example, plugins, fonts, most drivers, non-critical features)
 - - perl        : Adds support/bindings for the Perl language.
 - - selinux     : !!internal use only!! Security Enhanced Linux support, this must be set by the selinux profile or breakage will occur
 - - srvdir      : Add support for GLEP 20
 + + ssl         : Adds support for Secure Socket Layer connections
 - - static      : !!do not set this during bootstrap!! Causes binaries to be statically linked instead of dynamically
Comment 1 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-01-08 22:42:57 UTC
it sounds like your tables don't have the correct character sets marked on them (eg they contain UTF8, but are marked as LATIN1).

please check your 'show create table ...' output.
Comment 2 vad3R 2007-01-08 23:22:29 UTC
Tables are configured correctly. It works with Mysql version 5.0.26-r1. Only r2 is unusable with php. I think it's a problem with libmysqlclient.
Comment 3 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-01-08 23:33:23 UTC
Ok, that's really weird.
I checked 5.0.26-r1, and found that there are no changes in our patches between -r1 as of the revision immediately before it was deleted compared to -r2.
# for i in mysql-5.0.26-r1.ebuild mysql-5.0.26-r2.ebuild ; do ebuild $i unpack 1>/dev/null ; done ;
 diff -Nuar /var/tmp/portage/dev-db/mysql-5.0.26-r1/work/mysql  /var/tmp/portage/dev-db/mysql-5.0.26-r2/work/mysql
(no output, so they are identical)
#

vivo: what did you do?
Comment 4 vad3R 2007-01-08 23:47:13 UTC
That's really really weird. Can i give you more information on this?
Comment 5 Robin Johnson archtester Gentoo Infrastructure gentoo-dev Security 2007-01-09 00:02:36 UTC
please find exactly which revision of dev-db/mysql-5.0.26-r1 you were using.
look for the ebuild under /var/db/pkg, and check it's header.
Comment 6 Francesco R. (RETIRED) gentoo-dev 2007-01-09 08:14:58 UTC
I suspect the problem can be in either one of this two places:
- 105_all_mysql_config_cleanup.patch
  this patch modify the behaviour of "mysql_config" executable, used by php and much others to gather information on how to link against libmysql, has been introduced 2007-01-04/05
- php must use /etc/mysql/my.cnf, check with an strace (cli version is easier) that it read that config file

also check the obvious, my.cnf

Comment 7 Francesco R. (RETIRED) gentoo-dev 2007-01-09 09:06:05 UTC
> - 105_all_mysql_config_cleanup.patch
>   this patch modify the behaviour of "mysql_config" executable, used by php 

reference bug #156301 "mysql_config wrongly retains too much info from CFLAGS"
Comment 8 vad3R 2007-01-09 11:44:44 UTC
I'm nor sure if it has something to do with /etc/mysql/my.cnf. The staging server which runs -r1 contains a my.cnf where latin1 is specified. This is because there's a database running using latin1. The stage server uses for the webapp the productive database which is UTF-8. Here's the my.cnf from the stage:

[client]
port                                            = 3306
socket                                          = /var/run/mysqld/mysqld.sock
[mysql]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[mysqladmin]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[mysqlcheck]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[mysqldump]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[mysqlimport]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[mysqlshow]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1
[myisamchk]
character-sets-dir=/usr/share/mysql/charsets
[myisampack]
character-sets-dir=/usr/share/mysql/charsets
[mysqld_safe]
err-log                                         = /var/log/mysql/mysql.err
[mysqld]
character-set-server           = latin1
default-character-set          = latin1
skip-character-set-client-handshake
innodb_buffer_pool_size = 16M
innodb_additional_mem_pool_size = 2M
innodb_log_file_size = 5M
innodb_log_buffer_size = 8M
set-variable = innodb_log_files_in_group=2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_data_file_path = ibdata1:1024M:autoextend
innodb_file_per_table
innodb_flush_log_at_trx_commit  = 1
[mysqldump]
quick
max_allowed_packet                      = 16M
[mysql]
[isamchk]
key_buffer                                      = 20M
sort_buffer_size                        = 20M
read_buffer                             = 2M
write_buffer                            = 2M
[myisamchk]
key_buffer                                      = 20M
sort_buffer_size                        = 20M
read_buffer                             = 2M
write_buffer                            = 2M
[mysqlhotcopy]
interactive-timeout
user                                            = mysql
port                                            = 3306
socket                                          = /var/run/mysqld/mysqld.sock
pid-file                                        = /var/run/mysqld/mysqld.pid
log-error                                       = /var/log/mysql/mysqld.err
basedir                                         = /usr
datadir                                         = /var/lib/mysql
skip-locking
key_buffer                                      = 16M
max_allowed_packet                      = 1M
table_cache                             = 64
sort_buffer_size                        = 512K
net_buffer_length                       = 8K
read_buffer_size                        = 256K
read_rnd_buffer_size            = 512K
myisam_sort_buffer_size         = 8M
language                                        = /usr/share/mysql/english
log-bin
server-id                                       = 1
tmpdir                                          = /tmp/
Comment 9 vad3R 2007-01-23 12:05:43 UTC
Hello? Is there anything i can do to get this bug solved??
Comment 10 Jaco Kroon 2007-02-16 08:06:34 UTC
You're not the only one that was stuck with this, this had me running crazy the last three months or so and I finally stumbled onto a solution this morning.  Don't ask how, it was a nightmare involving strace a lot of screaming and some luck.  In /etc/mysql/my.cnf you have this:

[client]
port                                            = 3306
socket                                          = /var/run/mysqld/mysqld.sock
[mysql]
character-sets-dir=/usr/share/mysql/charsets
default-character-set=latin1

This implies that all clients get the right port and socket, however, only the mysqld client gets the right charset stuff, so copying those charset lines from the [mysql] section into the [client] section fixes it.

Could we please get this done by default on shipped my.cnf files?
Comment 11 vad3R 2007-03-09 11:05:09 UTC
This doesn't work for me. I tried all the versions in portage (with USE=minimal) but the problem stays the same. I won't be able to update MySQL anymore :-(
Comment 12 Jaco Kroon 2007-03-09 11:35:42 UTC
Did you restart the client?
Comment 13 Luca Longinotti (RETIRED) gentoo-dev 2007-03-09 11:40:04 UTC
Check https://forums.gentoo.org/viewtopic-p-3946384.html#3946384 the configuration was moved from my.cnf to php.ini in the newest PHP releases, and MySQL itself has nothing to do with this at all really, it only has its own "latin1" USE flag if you want latin1 instead of the default utf8 charset.
Best regards, CHTEKK.
Comment 14 vad3R 2007-03-09 13:53:48 UTC
I configured php and mysql according to the thread but the problem still exists :-(

here's how the content on the website looks:

Sie benötigen einen SCART-Anschluss an Ihrem TV-Gerät. Außerdem

/etc/php/apache2-php5/php.ini
===============================

; Local Variables:
; tab-width: 4
; End:

; MySQL extensions default connection charset settings
mysql.connect_charset = utf8
mysqli.connect_charset = utf8
pdo_mysql.connect_charset = utf8

[ebuild   R   ] dev-db/mysql-5.0.26-r2  USE="berkdb minimal ssl -big-tables -cluster -debug -embedded -extraengine -latin1 -max-idx-128 -perl (-selinux) -static" 0 kB

[ebuild   R   ] dev-lang/php-5.2.1-r3  USE="apache2 berkdb cli crypt curl gd gdbm iconv ipv6 mysql ncurses nls pcre readline reflection session soap spl ssl unicode xml zlib (-adabas) -apache -bcmath (-birdstep) -bzip2 -calendar -cdb -cgi -cjk -concurrentmodphp -ctype -curlwrappers -db2 -dbase (-dbmaker) -debug -discard-path -doc (-empress) (-empress-bcs) (-esoob) -exif -fastbuild (-fdftk) -filter (-firebird) -flatfile -force-cgi-redirect (-frontbase) -ftp -gd-external -gmp -hash -imap -inifile -interbase -iodbc -java-external -json -kerberos -ldap -ldap-sasl -libedit -mcve -mhash -msql -mssql -mysqli -oci8 (-oci8-instant-client) -odbc -pcntl -pdo -pdo-external -pic -posix -postgres -qdbm -recode -sapdb -sharedext -sharedmem -simplexml -snmp -sockets (-solid) -spell -sqlite -suhosin (-sybase) (-sybase-ct) -sysvipc -threads -tidy -tokenizer -truetype -wddx -xmlreader -xmlrpc -xmlwriter -xpm -xsl -yaz -zip -zip-external" 0 kB





Any more ideas??


Comment 15 Luca Longinotti (RETIRED) gentoo-dev 2007-03-09 16:15:14 UTC
Ok that looks about right if you want everything to be UTF8...
Still, are you sure the tables themselves and the data are correctly in UTF8 in the database? Also, what kind of charset header do you send out in your pages' HTML code? Try forcing to view the page with UTF8 as charset (in FireFox fex. go to View -> Character Encoding and select "Unicode (UTF-8)".
Generally charset viewing issues in a webpage can be tracked like this:
-the page itself (the charset header must be defined, and naturally to the charset you want)
-the PHP<->MySQL connection (that one you can set with the php.ini's mysql.connection_charset, remember to restart Apache!)
-the MySQL database itself (be sure it uses UTF8 itself, for it's databases, tables and data)
Best regards, CHTEKK.
Comment 16 vad3R 2007-03-09 16:54:16 UTC
(In reply to comment #15)
> Ok that looks about right if you want everything to be UTF8...
> Still, are you sure the tables themselves and the data are correctly in UTF8 in
> the database? 

I'm sure. If i go back to -r1 version the website uses all the right encoding

> Also, what kind of charset header do you send out in your pages'
> HTML code? Try forcing to view the page with UTF8 as charset (in FireFox fex.
> go to View -> Character Encoding and select "Unicode (UTF-8)".

The pages are sent as UTF-8 and again it works with the earlier version of mysql.

> Generally charset viewing issues in a webpage can be tracked like this:
> -the page itself (the charset header must be defined, and naturally to the
> charset you want)

Should be ok

> -the PHP<->MySQL connection (that one you can set with the php.ini's
> mysql.connection_charset, remember to restart Apache!)

I updated the config and restarted apache. Doesn't change anything

> -the MySQL database itself (be sure it uses UTF8 itself, for it's databases,
> tables and data)

Is ok. I can see the 3 byte UTF-8 representaion for the encoded chars

Regards

Daniel 

Comment 17 Heiko Schlabach 2007-04-05 16:13:40 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > Ok that looks about right if you want everything to be UTF8...
> > Still, are you sure the tables themselves and the data are correctly in UTF8 in
> > the database? 
> 
> I'm sure. If i go back to -r1 version the website uses all the right encoding
> 
To prove this please provide the output of the following statement: show table status from DB_NAME like 'TABLE_NAME'
Replace DB_NAME and TABLE_NAME approriately

> > Also, what kind of charset header do you send out in your pages'
> > HTML code? Try forcing to view the page with UTF8 as charset (in FireFox fex.
> > go to View -> Character Encoding and select "Unicode (UTF-8)".
> 
> The pages are sent as UTF-8 and again it works with the earlier version of
> mysql.
> 
> > Generally charset viewing issues in a webpage can be tracked like this:
> > -the page itself (the charset header must be defined, and naturally to the
> > charset you want)
> 
> Should be ok
> 
The suggested way to change firefox settings does not work if the displayed website has mixed-up character-encodings. Please check, if this tag in the html-header is exactly like this one :<meta http-equiv='content-type' content='text/html; charset=UTF-8'> 

> > -the PHP<->MySQL connection (that one you can set with the php.ini's
> > mysql.connection_charset, remember to restart Apache!)
> 
> I updated the config and restarted apache. Doesn't change anything
> 
> > -the MySQL database itself (be sure it uses UTF8 itself, for it's databases,
> > tables and data)
> 
> Is ok. I can see the 3 byte UTF-8 representaion for the encoded chars
> 
> Regards
> 
> Daniel 
> 

What is the output of this script?
<?php
$mysql_connection_id = mysql_connect('localhost', 'mysql_user', 'mysql_password');
print = mysql_client_encoding($mysql_connection_id);
?> 

If in all three cases you have utf-8 then you probably have double mixed-up the charset. Try to open the output of your website using a hex-editor; if the umlaut ü does not show up as #c3bc but 4 byte code like #c383c2bc then you have (taken #c383c2bc as example) previously saved utf-8 multibyte characters converted to iso-8859-1(two one byte characters) in your database converted back to utf-8 resulting in this errorneous #c383c2bc representation.
This happened for example in the german documentation for the mysql-client-encoding() function (Look at the description, open output of the website in hex editor)
http://de.php.net/manual/de/function.mysql-client-encoding.php
Comment 18 Jakub Moc (RETIRED) gentoo-dev 2007-05-20 21:25:13 UTC
Please, respond to Comment #17