Hello, guys! I was migrating to the new Quad AMD64 machine with Gentoo 2.6.7 kernel, so I installed the standard mysql 4.0.20 distribution for AMD64. Everything is fine except /proc/stat shows that all threads use just cpu0. And 'top' shows that 3 cpus are 100% idle. The server is under a valuable load - 'mysqladmin status' says 800 queries/sec. But it simply does not utilize cpu1-3. # cat /proc/stat |grep cpu[0-3] cpu0 100078 6285 73133 211489429 1222913 15497 71429 cpu1 1931 6133 7760 212958253 4585 0 9 cpu2 1508 5846 7288 212960523 3503 0 3 cpu3 1210 6042 7268 212962222 1906 0 4 cpu1-3 'user' value (1st figure) grows a little - I guess it's some system daemons, while cpu0 is doing all mysql tasks. After that I upgraded everything by emerge, so yesterday it was up to date. And today I compiled cd /usr/portage/dev-db/mysql/ emerge mysql-4.0.20.ebuild (Sorry, I'm new to Gentoo's intaller) There was a little problem - it compiled mysql 4.0.20 fine, then DBI/DBD related Perl modules, then it wanted to compile mysql 4.0.18 - dunno why. I stopped it and fixed installation by hands a little (my.cnf) and finally it works fine. Just - again - it doesn't use cpu1-3! I run many small jobs, so each hit takes just 0.02-0.05 sec, that's why the server can take this load, but anyways it should've split it between all CPUs. Thanks in advance! Hope this will be fixed Mike Blazer
i don't have any idea why this is occuring. And I don't have such a fancy box to play with to reproduce it ;-). as a test comparison could get the MySQL amd64 binary package from upstream, and install it in /usr/local/mysql-binary (or somewhere else well isolated) and see if it runs over multiple cpus.
Hello, guys! I greatly appreciate that you're working on the bug I reported. I just want to comment it: I guess you can reproduce it on any Dual or Quad machine. You can install mysql standard distribution for AMD64 (though I'm not sure it's AMD64 or anyAMD only bug). It could be something between Gentoo and mysql, or even mysql own bug, though in this case I guess it'd be reported before and by many users. 4.020 is not a freash build. The installation is plain aand simple - you untar it into any directory, run scripts/mysql_install_db - and it's done. After testing you can just remove the whole directory - all files are there (which I like most of with my Windoze background :) Thanks! Mike Blazer
I'm telling you to do that on your quadbox and provide the results here. my dual proc boxes are all in production use presently, and too active to try this on, and I don't have any quads.
My bugreport starts from the standard distribution. I installed it first and saw this cpu inactivity. And then the same happened with the Gentoo portage mysql-4.0.20. Tell me please what kind of results could help you, or if you have some tests - let me run it on my Quad box. Mike
ah, that wasn't clear from your original report. in your workload, all the work is being sent from multiple threads yes? i know mysql is limited in that if all the work is coming from a single thread/connection it will not use all of the power available. you need to have multiple connections to do that.
In fact I have two boxes of the same kind. One of them now runs mysql-standard and runs in production using just one cpu. The other one I used for experiments. I started this shell oneliner n=0;while [ $n -lt 1000 ];do n=`expr $n + 1`;mysql -h<host-with-mysql> -u<user> -p<pwd> -e'select * from mysql.user' > /dev/null;done from 3-4 hosts at the same time. And always only cpu0 'user' time grew, cpu1-3 were inactive, only 'idle' time grew up. Same results with the mysql binary distribution and the one installed with portages. Please, let me know if I can be of some help in the bug detection. PS. to your last question: yes, DB server that works in production serves 3 frontend servers with 25-30 apache/mod_perl threads each. And my shell oneliner - I started it from few servers at once. So, definitelly, a lot of requests come from different threads that can't share the same mysql connection, because my apache/mod_perl code don't free threads at all, each apache thread keeps it's own connection for its lifetime. This all worked with the old DB server (a Dual Intel Xeon running Slackware) giving the same load to both CPUs. And this week the only thing I changed is DB server address in my configs. Thanks! Mike
Here is what I see in 'top' after '1' command (to split by CPU) top - 16:16:30 up 29 days, 19:14, 2 users, load average: 0.11, 0.05, 0.10 Tasks: 169 total, 1 running, 168 sleeping, 0 stopped, 0 zombie Cpu0 : 1.3% us, 1.3% sy, 0.0% ni, 94.4% id, 1.7% wa, 0.3% hi, 1.0% si Cpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu3 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si it's the production machine that runs nothing but mysqld Mike
I've just installed a brand new copy of mysql-4.0.20 standard binary distribution on my other server: Dual Athlon 1400 with Slackware and rather old kernel 2.4.25 - it's a spare server these days, so I could be sure about /proc/stat values. And then I run the same oneliner n=0;while [ $n -lt 1000 ];do n=`expr $n + 1`;mysql -h<host-with-mysql> -u<user> -p<pwd> -e'select * from mysql.user' > /dev/null;done from the other 3 machines in the same network at once. And then - 10,000 cycles. In both cases /proc/stat 'user' values for cpu-0 and 1 increases almost by equal values, so, both CPUs take more or less equal load. This kinda approves that it's something wrong between mysql and Gentoo. Or - the newer kernel - but again, in this case I guess there'd be many reports. Thanks! Mike
now that you've tested it on that old Athlon it narrows it down to: kernel - 2.4 vs. 2.6 arch - x86 vs. amd64 glibc - maybe
Looks like that. Please let me know if I can help you - probably run some tests, patches etc. Thanks! Mike
do what you can to exclude the variables. put gentoo in the slack box's swap diskspace for a mini-install make it a 2.6 kernel and use the upstream mysql, then we can narrow it down to an arch problem OR a kernel problem.
Oops, in fact I'm not an admin, I'm perl programmer. I will ask my admin to upgrade kernel on the spare server, but the other things - about swap and mini-install - I don't really understand. On slackware I have glibc 2.3.2 while on Gentoo - 2.3.4. This is what I don't understand: http://directory.fsf.org/glibc.html says 2.3.2 is stable 2.3.3 devel relese. How come Gentoo upgraded it to 2.3.4? Thanks! Mike
get your admin to join the CC for this bug please. our glibc is 2.3.3 + snapshot patches from the glibc CVS, as 2.3.4 is too long in the works, and we need stuff it contains already.
Contact me via private Email for login information to this server. Just to recap on what we have: The machine is a Quad Opteron 848 with 16Gb RAM, running Gentoo installed from the 2004.1 disk (if I remember right). MySQL either emerged, or the binary from mysql.com, only utilize one CPU, where it should be using all the CPU's. The same front-end program (Apache web sites) on a dual Intel P4 running Slackware, with the mysql binary from mysql.com works as expected, equally using both CPU's. root@voy94:~# cat /etc/gentoo-release Gentoo Base System version 1.4.16 root@voy94:/usr/local/mysql# bin/mysql -V bin/mysql Ver 12.22 Distrib 4.0.20, for unknown-linux (x86_64) Linux voy94.voyeurweb.com 2.6.7-gentoo-r11 #2 SMP Fri Jul 16 17:13:15 PDT x86_64 AMD Opteron(tm) Processor 848 AuthenticAMD GNU/Linux root@voy94:~# cat /proc/cpuinfo | egrep -e '(processor|MHz|name)' processor : 0 model name : AMD Opteron(tm) Processor 848 cpu MHz : 2191.090 processor : 1 model name : AMD Opteron(tm) Processor 848 cpu MHz : 2191.090 processor : 2 model name : AMD Opteron(tm) Processor 848 cpu MHz : 2191.090 processor : 3 model name : AMD Opteron(tm) Processor 848 cpu MHz : 2191.090 root@voy94:~# cat /proc/meminfo | grep Total MemTotal: 15428060 kB HighTotal: 0 kB LowTotal: 15428060 kB SwapTotal: 128512 kB VmallocTotal: 536870911 kB HugePages_Total: 0
OK, guys, let's keep writing through the bugzilla interface, there are other subscribers from the gentoo team. Pete, voy94 is ready, so you can put Robin's pubkey or create a new user for him. Robin, please note that /raid1 and /raid2 on that machine are RAID50 arrays, though you won't need it in fact. I have 2 copies of mysql on this machine. 1. 4.0.20 standard binary distribution for x86_64 installed in /host/mysql you can start it with cd /host/mysql bin/mysqld_safe --user=mysql --skip-bdb --skip-innodb & and stop bin/mysqladmin shutdown 2. 4.0.20 built with emerge - installed in standard places: /usr/include/mysql/ - .h files /usr/share/mysql/ - languages, tests, benchmarks Libraries: /usr/lib/mysql/* and /usr/lib/libmysqlclient.so.12.0.0 /usr/lib/libmysqlclient_r.so.12.0.0 Binaries: /usr/bin/mysql/* /usr/sbin/mysqld Configs: /etc/mysql/my.cnf First remove /host/mysql/bin from the $PATH - it's before /usr/bin there: PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/3.3 to start: mysqld_safe --skip-bdb --skip-innodb --datadir=/host/mysql/data & to stop: mysqladmin shutdown if you want to run my oneliners for testing from the other machines you can alow access to mysql server: GRANT ALL ON *.* TO username@'hostname or ip' IDENTIFIED BY 'your passwd'; FLUSH PRIVILEGES; and then from that machine: n=0;while [ $n -lt 1000 ];do n=`expr $n + 1`;mysql -h64.156.136.94 -uusername -ppwd -e'select * from mysql.user' > /dev/null;done this way cat /proc/stat|grep cpu[0-3] will show pure mysql timing. Just to get it all together and save some time.
This is my mysql load to attempt to reproduce the problem. It must be careful of two specific things: 1. mysql query cache - flushed whenever an insert/update happens - so need to constantly insert 2. network bandwidth limitations client - have a test query that is cpu bound and returns very little data. on server: CREATE TABLE `test` ( `i` int(11) NOT NULL auto_increment, `a` float default NULL, `b` float default NULL, `c` float default NULL, `d` float default NULL, PRIMARY KEY (`i`) ); on client 1: INSERT INTO test (a,b,c,d) values (rand(),rand(),rand(),rand()); on client 2: SELECT sum(length(concat(sha1(sha1(sha1(sha1(a)))),sha1(sha1(sha1(sha1(b)))),sha1(sha1(sha1(sha1(c)))),sha1(sha1(sha1(sha1(d))))))) from test.test order by a,b,c,d; this query takes ~10 seconds to run when the test table contains ~200000 rows. from client 1, run the insert query continously from at least 6 threads (network i/o bounded here). from client 2, run the select query continously from at least 12 threads (cpu bounded here). mysql will spawn to handle each of the threads, and the actual computation work will push the load to 100% on all cpus.
changes made to my.cnf, all under the mysqld section. datadir = /host/mysql/data #(so I can start mysql with the gentoo scripts) set-variable = key_buffer=16M set-variable = max_allowed_packet=4M set-variable = thread_stack=512K key_buffer=16M max_allowed_packet=4M thread_stack=512K query_cache_size = 128M sort_buffer_size = 8K myisam_sort_buffer_size = 8M all of these just allow mysql to rise to the hardware of the machine. it probably wouldn't hurt to push some of them, and some of the other more obscure ones way up as well.
using another much simpler testcase, to check for throughtput instead of actual calculation work: Uptime: 91 Threads: 5 Questions: 1494978 Slow queries: 0 Opens: 7 Flush tables: 1 Open tables: 1 Queries per second avg: 16428.330
Robin, I just run your /root/mysql.pl from 2 different servers and I see the same result: only one cpu works
closing this now, since we found it was actually working right, and your test load wasn't enough for it to show up :-). Thanks for temporary use of your spectactullarly fast machine, I did mostly did build tests for amd64 -> ANY cross-compilers in record time. at under 3-6 minutes for each one, it was great (my own machines take over 30 min for each one).