Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!

Bug 649924

Summary: =dev-db/mariadb-10.1.29 with =sys-auth/libnss-mysql-1.5_p20060915-r3 - checkconfig crashes if mysqld is already running and libnss-mysql is in use
Product: Gentoo Linux Reporter: Jaco Kroon <jaco>
Component: Current packagesAssignee: Gentoo Linux MySQL bugs team <mysql-bugs>
Status: CONFIRMED ---    
Severity: normal    
Priority: Normal    
Version: unspecified   
Hardware: All   
OS: Linux   
Whiteboard:
Package list:
Runtime testing required: ---

Description Jaco Kroon 2018-03-08 15:49:37 UTC
When mysqld (from mariadb 10.1.29 at least) is running then libnss-mysql is in use, then checkconfig in /etc/init.d/mysql results in a segfault, and failure to restart.

I suspect this is due to mysqld linking it's own internal versions of mysql_real_connect which by the time mysqld calls my_set_user, which calls libc initgroups() isn't properly initialized.  Not sure if this is a libnss-mysql or mariadb issue - so guidance as to which upstream this should be reported to in addition to filing here would be greatly appreciated.  Possibly this is just the way we build mariadb (surely the daemon can use the client library just as easy as compiled-in variants?).

The backtrace output by the command /usr/sbin/mysqld --defaults-file=/etc/mysql/my.cnf --help --verbose is as follows:

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x29)[0x555555eb9ec9]
/usr/sbin/mysqld(handle_fatal_signal+0x3ad)[0x555555a9e5fd]
/lib64/libpthread.so.0(+0x13be0)[0x7ffff6b97be0]
/usr/sbin/mysqld(thd_increment_bytes_received+0x0)[0x55555591a670]
/usr/sbin/mysqld(+0x374c7e)[0x5555558c8c7e]
/usr/sbin/mysqld(my_net_read_packet+0x191)[0x5555558c9a11]
/usr/sbin/mysqld(cli_safe_read+0x2f)[0x555555a7bbbf]
/usr/sbin/mysqld(mysql_real_connect+0x403)[0x555555a7f973]
/usr/lib64/libnss_mysql.so.2(+0x3518)[0x7ffff4baf518]
/usr/lib64/libnss_mysql.so.2(+0x3b4a)[0x7ffff4bafb4a]
/usr/lib64/libnss_mysql.so.2(+0x4054)[0x7ffff4bb0054]
/usr/lib64/libnss_mysql.so.2(_nss_mysql_initgroups_dyn+0xc3)[0x7ffff4bb0c83]
/lib64/libc.so.6(+0xbffb2)[0x7ffff58c8fb2]
/lib64/libc.so.6(initgroups+0x77)[0x7ffff58c9297]
/usr/sbin/mysqld(my_set_user+0x14)[0x555555ec9b34]
/usr/sbin/mysqld(+0x36dba8)[0x5555558c1ba8]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0xa8f)[0x5555558c7d7f]
/lib64/libc.so.6(__libc_start_main+0xf1)[0x7ffff5829521]
/usr/sbin/mysqld(_start+0x2a)[0x5555558bc0da]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

This crash occurred while the server was calling initgroups(). This is
often due to the use of a mysqld that is statically linked against 
glibc and configured to use LDAP in /etc/nsswitch.conf.
You will need to either upgrade to a version of glibc that does not
have this problem (2.3.4 or later when used with nscd),
disable LDAP in your nsswitch.conf, or use a mysqld that is not statically linked.

ldd in this case indicates that mysqld is NOT statically linked.

gdb provides a little more detail (but doesn't actually indicate which variant of the various functions are used as per above:

(gdb) bt
#0  thd_increment_bytes_received (thd=0x0, length=4)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql/sql_class.cc:4084
#1  0x00005555558c8c7e in my_real_read (net=0x7ffff4db7248 <ci+8>, 
    complen=complen@entry=0x7fffffffcf38, header=header@entry=0 '\000')
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql/net_serv.cc:952
#2  0x00005555558c9a11 in my_net_read_packet (net=net@entry=0x7ffff4db7248 <ci+8>, 
    read_from_server=read_from_server@entry=0 '\000')
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql/net_serv.cc:1138
#3  0x0000555555a7bbbf in cli_safe_read (mysql=mysql@entry=0x7ffff4db7248 <ci+8>)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql-common/client.c:581
#4  0x0000555555a7f973 in mysql_real_connect (mysql=mysql@entry=0x7ffff4db7248 <ci+8>, 
    host=0x555555ef7eff "localhost", host@entry=0x7ffff4db59c4 <conf+10244> "localhost", 
    user=<optimized out>, passwd=0x7ffff4db69c4 <conf+14340> "E8Eh0aacMGXdtdhO", 
    db=<optimized out>, db@entry=0x7ffff4db6dc4 <conf+15364> "uls", port=3306, 
    unix_socket=0x555556f9c078 "/var/run/mysqld/mysqld.sock", client_flag=0)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql-common/client.c:3451
#5  0x00007ffff4baf518 in _nss_mysql_connect_sql (mresult=mresult@entry=0x7fffffffde48)
    at mysql.c:233
#6  0x00007ffff4bafb4a in _nss_mysql_escape_string (to=to@entry=0x7fffffffd4a0 "", 
    from=from@entry=0x555556f5b0ff "mysql", mresult=mresult@entry=0x7fffffffde48) at mysql.c:344
#7  0x00007ffff4bb0054 in _nss_mysql_build_query (caller=0x7ffff4bb17a6 "initgroups", 
    mresult=0x7fffffffde48, qout=0x7fffffffd5b0 "`¡\201õÿ\177", 
    qin=0x7ffff4db51c4 <conf+8196> "SELECT group_id+1000 FROM user, user_group WHERE user.user_id=user_group.user_id and username='%1$s'", num=0, name=0x555556f5b0ff "mysql", ltype=BYNAME)
    at lookup.c:68
#8  _nss_mysql_lookup (ltype=ltype@entry=BYNAME, name=name@entry=0x555556f5b0ff "mysql", 
    num=num@entry=0, 
    q=0x7ffff4db51c4 <conf+8196> "SELECT group_id+1000 FROM user, user_group WHERE user.user_id=user_group.user_id and username='%1$s'", restricted=restricted@entry=nfalse, 
    result=result@entry=0x7fffffffde50, buffer=0x0, buflen=0, errnop=0x7ffff7fdf6b0, 
    load_func=0x7ffff4baeea0 <_nss_mysql_load_gidsbymem>, mresult=0x7fffffffde48, 
    caller=0x7ffff4bb17a6 "initgroups") at lookup.c:162
#9  0x00007ffff4bb0c83 in _nss_mysql_initgroups_dyn (user=0x555556f5b0ff "mysql", 
    group=<optimized out>, start=0x7fffffffdef0, size=0x7fffffffdf58, groupsp=0x7fffffffdf60, 
    limit=65536, errnop=0x7ffff7fdf6b0) at mysql-grp.c:182
#10 0x00007ffff58c8fb2 in ?? () from /lib64/libc.so.6
#11 0x00007ffff58c9297 in initgroups () from /lib64/libc.so.6
#12 0x0000555555ec9b34 in my_set_user (user=<optimized out>, user_info=0x7ffff5bb7d60, 
    MyFlags=MyFlags@entry=16)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/mysys/my_setuser.c:71
#13 0x00005555558c1ba8 in set_user (user=<optimized out>, user_info_arg=<optimized out>)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql/mysqld.cc:2371
#14 0x00005555558c7d7f in mysqld_main (argc=<optimized out>, argv=<optimized out>)
    at /var/tmp/portage/dev-db/mariadb-10.1.29/work/mysql/sql/mysqld.cc:5716
#15 0x00007ffff5829521 in __libc_start_main () from /lib64/libc.so.6
#16 0x00005555558bc0da in _start ()


I've started seeing this on a lot of our machines off late, and it's easily worked around by doing:

/etc/init.d/mysql stop && /etc/init.d/mysql start

instead of /etc/init.d/mysql restart ... with added risk of config not begin validated so start could possibly fail.

I suspect the reason for mariadb having those functions inside of the binary in the first place is because that code is used internally for replication clients (slaves).

Very unsure of how to proceed.  The reason it doesn't segfault in the case where mysqld isn't running already is because the code backs out due to connection refused trying to connect to mysql server.

Reproducible: Always
Comment 1 Jaco Kroon 2018-06-27 11:56:21 UTC
Just some additional information I've managed to track down this morning on a debug host:

arthur ~ # for file in /usr/sbin/mysqld /usr/lib64/libmysqlclient.so.18 /usr/lib64/libnss_mysql.so.2.0.0; do echo $file:; objdump -T $file | grep '[[:space:]]mysql_real_connect$'; done
/usr/sbin/mysqld:
000000000052c780 g    DF .text  0000000000000dae  Base        mysql_real_connect
/usr/lib64/libmysqlclient.so.18:
000000000002dd00 g    DF .text  0000000000000000 (libmysqlclient_16) mysql_real_connect
000000000002dd00 g    DF .text  0000000000000e45  libmysqlclient_18 mysql_real_connect
/usr/lib64/libnss_mysql.so.2.0.0:
0000000000000000      DF *UND*  0000000000000000  libmysqlclient_18 mysql_real_connect


Based on the above, libnss_mysql will (to my understanding) invoke the version from the shared library (version matches).  The library should be initialised.

/usr/sbin/mysqld:
000000000052a320 g    DF .text  00000000000000bf  Base        mysql_init
/usr/lib64/libmysqlclient.so.18:
000000000002b820 g    DF .text  00000000000000cf  libmysqlclient_18 mysql_init
000000000002b820 g    DF .text  0000000000000000 (libmysqlclient_16) mysql_init
/usr/lib64/libnss_mysql.so.2.0.0:
0000000000000000      DF *UND*  0000000000000000  libmysqlclient_18 mysql_init

Again, should not be a problem.

Assuming that there are no other undesired interaction between the two.

#0 The code resulting in the problem (unwinding the stack):

void thd_increment_bytes_received(void *thd, ulong length)
{
  ((THD*) thd)->status_var.bytes_received+= length;
}

Implies that likely thd is null ... GDB confirms.

Going down #1, this comes from net->thd, but it's not touched elsewhere in my_real_read that I can see.


#2 has #1 as the first call and definitely doesn't touch net->thd.

#3 cli_safe_read gets the net argument from mysql->net.

#4 mysql_real_connect reveals that net is a struct (not pointer to) inside mysql, and mysql_real_connect uses & to get the NET* for down functions.  It also doesn't touch net->thd, only vio (quite a bit).

Ok, so the only other thing is then mysql_init(), which is called on the MYSQL* just prior to it being handed to mysql_real_connect.

mysql_init sets the entire struct to zero, and then starts filling it out.

net->thd is thus never initialized, and I'm not sure where it would normally get initialized from.

For *client* this may not be a problem because in sql/net_serv.cc (line 99 on):

#ifdef MYSQL_SERVER
...
#else
#define update_statistics(A)
#define thd_net_is_killed() 0
#endif

And then on the line in #1:

update_statistics(thd_increment_bytes_received(net->thd, length));

Implies that for the client library this never happens.  So somehow it ends up that we're calling the mysqld version of my_real_read instead of the client library version.  According to objdump this symbol isn't exported from anywhere.

my_net_read_packet (#2) is exported from mysqld, but neither libmysqlclient nor libnss_mysql has a reference to that symbol (objdump -T at least).  The same for cli_safe_read (#3).

The first symbol imported/exported is mysql_real_connect - this implies that libnss_mysql is loading the "Base" version instead of the versioned variant from libmysqlclient.

Not sure how to approach that particular issue.  A few possible solutions I can think off of hand:

1.  Test for thd to be not-null in thd_increment_bytes_received (or the caller even).  There seems to be similar checks in thd_increment_bytes_sent (same function).  Likely to just have a cascading effect of more such locations.  This I can attempt.

2.  Fix it so that the libmysqlclient versions of stuff are used and not the mysqld ones.  This would likely be the better solution; or possibly:

3.  Don't export the symbols from mysqld binary to begin with.  Might be the way to achieve (2).

If someone can point me in a direction I'll attempt a patch.  Would need some pointers for 2/3 as to where to begin with.  1 I should be OK with.
Comment 2 Jaco Kroon 2018-09-02 17:41:40 UTC
Just spotted this again with:

 * Searching for mariadb ...
 * Searching for libnss-mysql ...
 * Searching for mysql-connector-c ...

This crash occurred while the server was calling initgroups(). This is
often due to the use of a mysqld that is statically linked against 
glibc and configured to use LDAP in /etc/nsswitch.conf.
You will need to either upgrade to a version of glibc that does not
have this problem (2.3.4 or later when used with nscd),
disable LDAP in your nsswitch.conf, or use a mysqld that is not statically linked.

Implies mysqld might be statically linked, and "ldd $(which mysqld)" indeed indicates that mysqld (whilst not static linked) doesn't link against libmysqlclient.so, so I have to assume that it's using it's own internal version for replication which is where libnss-mysql is probably picking up a conflict.



(gdb) bt
#0  thd_increment_bytes_received (thd=0x0, length=4) at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql/sql_class.cc:3994
#1  0x00005555558c9cbe in my_real_read (net=0x7ffff539b228, complen=complen@entry=0x7fffffffce68, header=header@entry=0 '\000') at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql/net_serv.cc:954
#2  0x00005555558caa51 in my_net_read_packet (net=net@entry=0x7ffff539b228, read_from_server=read_from_server@entry=0 '\000') at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql/net_serv.cc:1140
#3  0x0000555555a7a85f in cli_safe_read (mysql=mysql@entry=0x7ffff539b228) at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql-common/client.c:581
#4  0x0000555555a7e6cc in mysql_real_connect (mysql=0x7ffff539b228, host=0x555555efa298 "localhost", user=<optimized out>, passwd=0x7ffff539a9a4 "0yEY6fI815KiEWZT", db=<optimized out>, port=3306, unix_socket=0x555556fadf58 "/var/run/mysqld/mysqld.sock", client_flag=0)
    at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql-common/client.c:3467
#5  0x00007ffff519495e in ?? () from /usr/lib64/libnss_mysql.so.2
#6  0x00007ffff5194ce7 in ?? () from /usr/lib64/libnss_mysql.so.2
#7  0x00007ffff5194f5e in ?? () from /usr/lib64/libnss_mysql.so.2
#8  0x00007ffff5195752 in _nss_mysql_initgroups_dyn () from /usr/lib64/libnss_mysql.so.2
#9  0x00007ffff58873b2 in ?? () from /lib64/libc.so.6
#10 0x00007ffff5887697 in initgroups () from /lib64/libc.so.6
#11 0x0000555555ecbf14 in my_set_user (user=<optimized out>, user_info=0x7ffff5b7dec0, MyFlags=MyFlags@entry=16) at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/mysys/my_setuser.c:71
#12 0x00005555558c2b88 in set_user (user=<optimized out>, user_info_arg=<optimized out>) at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql/mysqld.cc:2371
#13 0x00005555558c8cf7 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /var/tmp/portage/dev-db/mariadb-10.1.34/work/mysql/sql/mysqld.cc:5726
#14 0x00007ffff57e0011 in __libc_start_main () from /lib64/libc.so.6
#15 0x00005555558bd10a in _start ()

Confirms that the call is going back into mysqld instead of using libmysqlclient.  I suspect mariadb should not export the client symbols such that the libnss-mysql component will utilize the symbols from the library instead.

I don't know how to do that.