on a hardened x86 system with mysql 5.0.38 and berkdb.mysql useflags, a restart or stop of mysql isn't possible. process is still active. killall mysqld or killall "pid-number" doesn't work. only "kill -s 9 pid" stops mysql downgrade to 5.0.26-r2 resolves the problem Reproducible: Always Steps to Reproduce: 1. emerge mysql 2. emerge --config mysql or 1. /etc/init.d/mysql restart or 1. /etc/init.d/mysql stop Actual Results: "stopping mysql" .. *hang* Expected Results: process(es) should have been stopped emerge --info Portage 2.1.2.2 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r5, 2.6.19-gentoo-r5 i686) ================================================================= System uname: 2.6.19-gentoo-r5 i686 Intel(R) Pentium(R) 4 CPU 3.20GHz Gentoo Base System release 1.12.9 Timestamp of tree: Mon, 16 Apr 2007 05:20:01 +0000 dev-lang/python: 2.3.5-r3, 2.4.3-r4 dev-python/pycrypto: 2.0.1-r5 sys-apps/sandbox: 1.2.17 sys-devel/autoconf: 2.13, 2.60 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.16.1-r3 sys-devel/gcc-config: 1.3.15-r1 sys-devel/libtool: 1.5.22 virtual/os-headers: 2.6.17-r2 ACCEPT_KEYWORDS="x86" AUTOCLEAN="yes" CBUILD="i386-pc-linux-gnu" CFLAGS="-O2 -march=pentium4" CHOST="i386-pc-linux-gnu" CONFIG_PROTECT="/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=pentium4" DISTDIR="/usr/portage/distfiles" FEATURES="distlocks metadata-transfer sandbox sfperms strict" GENTOO_MIRRORS="ftp://gentoo.inode.at/source/ ftp://gd.tuwien.ac.at/opsys/linux/gentoo/ ftp://ftp.belnet.be/mirror/rsync.gentoo.org/gentoo/ " MAKEOPTS="-j3" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" SYNC="rsync://rsync.europe.gentoo.org/gentoo-portage" USE="apache2 berkdb crypt hardened midi nls no-old-linux pam pic readline ssl tcpd x86 xorg zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="mouse keyboard" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" Unset: CTARGET, EMERGE_DEFAULT_OPTS, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PORTDIR_OVERLAY
sharky2: please killall -9 mysqld at that point, and get a backtrace as to where it was during the running. Also advise if newer GCC/glibc still cause the problem to be present - I don't have any machines that are still on GCC3/glibc-2.3, I'm on 4.1/2.5 everywhere.
(In reply to comment #1) > Also advise if newer GCC/glibc still cause the > problem to be present - I don't have any machines that are still on > GCC3/glibc-2.3, I'm on 4.1/2.5 everywhere. gcc-4* and >=glibc-2.4 is package.masked on hardened (yeah, sucks).
I will setup another machine trying to reproduce the problem, because i ran out of time for testing on this productional server how do i have to backtrace? is there a short howto?
I can confirm this problem on hardened x86.
Same here :-((( Strangely, my /var/run/mysqld/mysqld.pid does not contain the PID of the mysqld parent (6748), but the PID of some mysqld child process (6764). strace of the master process shows no activity at all when I run "kill 6748": --8<-- keel mysqld # strace -tt -p 6748 Process 6748 attached - interrupt to quit 13:57:26.283736 select(17, [15 16], NULL, NULL, NULL --8<-- Same goes for SIGHUP for example. When I run "kill 6764" (the PID from the pidfile), then this is being logged: --8<-- 070417 13:59:24 [Note] /usr/sbin/mysqld: Normal shutdown --8<-- But nothing more happens.
I tried to get a backtrace, but I failed :( 1. env USE="debug" FEATURES="nostrip" emerge mysql --8<-- /usr/sbin/mysqld: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), for GNU/Linux 2.4.1, not stripped --8<-- 2. gdb: --8<-- # gdb --args /usr/sbin/mysqld --console --core-file --debug='d:t:i:o,/tmp/mysqld.trace' --gdb GNU gdb 6.6 Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-pc-linux-gnu"... Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run Starting program: /usr/sbin/mysqld --console --core-file --debug=d:t:i:o,/tmp/mysqld.trace --gdb 070417 15:24:12 [Warning] The syntax for replication startup options is deprecated and will be removed in MySQL 5.2. Please use 'CHANGE MASTER' instead. Program received signal SIG32, Real-time event 32. 0x4fb63c74 in ?? () (gdb) bt #0 0x4fb63c74 in ?? () #1 0x5a2e27e8 in ?? () #2 0x4fb63458 in ?? () #3 0x5a2e2750 in ?? () #4 0x00000020 in ?? () Backtrace stopped: previous frame inner to this frame (corrupt stack?) --8<-- 3. I added this to /etc/conf.d/mysql: --8<-- mysql_slot_0=( "core-file" "debug=d:t:i:o,/tmp/mysqld.trace" "gdb" ) --8<-- Then I started mysql and looked at /tmp/mysqld.trace. After running "/etc/init.d/mysql stop", lines like this keep getting written to the trace file: --8<-- T@180236: | | info: Waiting for select thread --8<-- Any hints?
I just freshly emerged mysql-5.0.38 on another hardened x86 system I set up after the same scheme than the first one 2 weeks later, mysqld on this 2nd machine stops just fine... argh :(
Ok I found something out: - machine #1 where the problem occurs has CHOST=i386-[...] - machine #2 where the problem does NOT occur has CHOST=i686-[...] On machine #1, I can see many mysqld processes in the default process listing (they do not run as threads there). On machine #2, I only see one mysqld process in the default process listing, but using 'ps -efL' I can see many mysqld threads. Both gcc and glibc were merged with identical USE flags (nptl nptlonly etc.), The only difference I can see is the CHOST. Ok, another finding: Looked at /usr/portage/sys-libs/glibc/glibc-2.3.6-r5.ebuild and found out that TLS (Thread Local Storage) is only supported from i486 upwards (see function want_tls(), line #799 in the aforementioned ebuild). I guess that's the problem? I installed the machine from a hardened stage3 which uses CHOST=i386-[...]. Will rebuild everything now (fun fun fun).
See bug #106556 (i386 -> no nptl). Still recompiling though...
Argh, wrong radio button... sorry!
I have CHOST="i686-pc-linux-gnu". mysql 5.0.34 works fine for me. emerge --info Portage 2.1.2.4 (hardened/x86/2.6, gcc-3.4.6, glibc-2.3.6-r5, 2.6.20.4-grsec i686) ================================================================= System uname: 2.6.20.4-grsec i686 AMD Athlon(tm) MP 1800+ Gentoo Base System release 1.12.10 Timestamp of tree: Tue, 17 Apr 2007 10:00:01 +0000 ccache version 2.4 [enabled] dev-lang/python: 2.4.4 dev-python/pycrypto: 2.0.1-r5 dev-util/ccache: 2.4-r6 sys-apps/sandbox: 1.2.18.1 sys-devel/autoconf: 2.13, 2.60 sys-devel/automake: 1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10 sys-devel/binutils: 2.17 sys-devel/gcc-config: 1.3.16 sys-devel/libtool: 1.5.23b virtual/os-headers: 2.6.20-r2 ACCEPT_KEYWORDS="x86 ~x86" AUTOCLEAN="yes" CBUILD="i686-pc-linux-gnu" CFLAGS="-O2 -march=athlon-xp -mtune=athlon-xp -mmmx -msse -m3dnow -pipe" CHOST="i686-pc-linux-gnu" CONFIG_PROTECT="/etc /usr/share/X11/xkb /var/qmail/alias /var/qmail/control /var/vpopmail/domains /var/vpopmail/etc" CONFIG_PROTECT_MASK="/etc/env.d /etc/gconf /etc/php/apache1-php5/ext-active/ /etc/php/apache2-php5/ext-active/ /etc/php/cgi-php5/ext-active/ /etc/php/cli-php5/ext-active/ /etc/revdep-rebuild /etc/terminfo" CXXFLAGS="-O2 -march=athlon-xp -mtune=athlon-xp -mmmx -msse -m3dnow -pipe" DISTDIR="/usr/portage/distfiles" EMERGE_DEFAULT_OPTS="--with-bdeps=y" FEATURES="candy ccache distlocks metadata-transfer parallel-fetch sandbox sfperms strict userfetch userpriv usersandbox" GENTOO_MIRRORS="http://ds.thn.htu.se/linux/gentoo ftp://ftp.du.se/pub/os/gentoo http://distfiles.gentoo.org http://distro.ibiblio.org/pub/Linux/distributions/gentoo" MAKEOPTS="" PKGDIR="/usr/portage/packages" PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --delete-after --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --filter=H_**/files/digest-*" PORTAGE_TMPDIR="/var/tmp" PORTDIR="/usr/portage" PORTDIR_OVERLAY="/usr/local/portage" SYNC="rsync://rsync.gentoo.org/gentoo-portage" USE="apache2 berkdb crypt hardened mailwrapper midi mysql nls pam pcre pic readline session ssl x86 xorg zlib" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mulaw multi null plug rate route share shm softvol" ELIBC="glibc" INPUT_DEVICES="mouse keyboard" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" USERLAND="GNU" VIDEO_CARDS="fbdev dummy" Unset: CTARGET, INSTALL_MASK, LANG, LC_ALL, LDFLAGS, LINGUAS, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Ok, it was fixed after recompiling things with a new CHOST. See http://rafb.net/p/UDgt8681.html for some excerpt of my gentoo setup script that takes care of it on a new install. So I guess upstream has changed something regarding non-NPTL installations between 5.0.26 and 5.0.38?
Maybe this is related? Quoting http://dev.mysql.com/doc/refman/5.0/en/releasenotes-cs-5-0-37.html "A workaround was implemented to avoid a race condition in the NPTL pthread_exit() implementation. (Bug#24507)" -> http://bugs.mysql.com/bug.php?id=24507
Sorry, I was a bit unclear. 5.0.38 does not work for me and I have CHOST=i686.
(In reply to comment #14) > Sorry, I was a bit unclear. 5.0.38 does not work for me and I have CHOST=i686. > When you start mysql, do you see many mysqld processes with 'ps ax' or just 1? If you see more than 1 it doesn't use threads -- remerge glibc and try again.
> When you start mysql, do you see many mysqld processes with 'ps ax' or just 1? > If you see more than 1 it doesn't use threads -- remerge glibc and try again. > It works now, a recompile of glibc with nptl did the trick. Thanks.
Ok, so closing as invalid. Anybody on hardened needs to ensure they are using NPTL in their glibc.
I would hardly call this invalid. This is clearly a bug in UPSTREAM. nptl can and should not be required.
(In reply to comment #13) > Maybe this is related? > Quoting http://dev.mysql.com/doc/refman/5.0/en/releasenotes-cs-5-0-37.html > > "A workaround was implemented to avoid a race condition in the NPTL > pthread_exit() implementation. (Bug#24507)" > > -> http://bugs.mysql.com/bug.php?id=24507 Wolfram, Can you please contact upstream and inform them that fix infact broke non nptl installs?
(In reply to comment #19) > Wolfram, > Can you please contact upstream and inform them that fix infact broke non nptl > installs? I'll try :)
Has anyone tried mysql-5.0.38 with glibc and USE="-nptl" on a non-hardened system?!
Ok, MySQL now has a bug report at http://bugs.mysql.com/bug.php?id=27977 Looking forward to replies...
Ok, duplicate of http://bugs.mysql.com/bug.php?id=27310 *sigh*
Quoting upstream: --8<-- fixed in 5.0.42 and 5.1.18 --8<-- Latest 5.0 in portage is 5.0.38 -- we need a bump then :)
Ok, upstream correction regarding the fixed version: it was already fixed in 5.0.40.
Argh, first they say "fixed in $version", then they say "$version is not yet released". Standby for now :-/
Ok, 5.0.40 is available now: ftp://ftp.mysql.com/pub/mysql/src/mysql-5.0.40.tar.gz
my hardened servers are also sufering from this problem for when is the version bump scheduled?
5.0.40 in the tree now, resolved.
Test Result: SUCCESS Fixed for me on previously failing system: With 5.0.38, this would hang forever ======================= lacewing init.d # !! ./mysql restart * Stopping mysql ... * Stopping mysqld (0) [ ok ] * Starting mysql ... * Starting mysql (/etc/mysql/my.cnf) [ ok ] ======================= In fact, to get to this point, I had to stop 5.0.38 using a lot of manual intervention, but when it was finally completely gone, 5.0.40 came up cleanly and seems to be fine.
Thanks, closing fully.