I have several hardened gentoo hosts, which run docker. The hosts and the containers are based on the same hardened gentoo image. Now when i try to setup replication between mariadb-galera containers using the wsrep rsync backend, the following problem arises: /usr/bin/wsrep_sst_rsync is called to setup rsync for initial sync. This script relies on lsof in check_pid_and_port() however, due too capability restricts (from probably both docker and grsecurity) lsof (and any other tool like netstat) wont' work as uid mysql, resulting in an endless loop of check_pid_and_port() blocking mariadb from starting up. as a workaround, i disabled the call of check_pid_and_port. here is what i did: sed -i 's/until check_pid_and_port.*/until \$(check_pid \$RSYNC_PID \&\& \[ \$(cat \$RSYNC_PID) -eq \$RSYNC_REAL_PID \])/' /usr/bin/wsrep_sst_rsync it does the trick, but does not solve the problem. (if it can be solved at all without loosening restrictions on the host) Given the exotic nature of my setup, i did not know to whom i should direct my 'bug' report to. So here you are :)
When you do find out where to direct this bug report (blame a package for instance) feel free to provide that information and reopen this bug report. Meanwhile, we have a web forum, IRC channels and mailing lists for general support questions.
remove the docker component from the bug report, and it reads like this: using hardened gentoo, with default grecurity config in kernel, *any* call to lsof and the like in scripts not running as root, will trigger this bug/feature. so in this scenario, package dev-db/mariadb-galera is to blame, for relying on tools (lsof) that are not save to use as non root. but overall it is a hardened gentoo thing. your call.
--- a/scripts/wsrep_sst_rsync.sh +++ b/scripts/wsrep_sst_rsync.sh @@ -278,7 +278,7 @@ rsync --daemon --no-detach --port $RSYNC_PORT --config "$RSYNC_CONF" & RSYNC_REAL_PID=$! - until check_pid_and_port $RSYNC_PID $RSYNC_REAL_PID $RSYNC_PORT + until $(check_pid $RSYNC_PID && [ $(cat $RSYNC_PID) -eq $RSYNC_REAL_PID ]) do sleep 0.2 done
If the rsync sst methods does not work, try another. The xtrabackup-v2 sst method does not use lsof. It uses xtrabackup-bin and either socat or nc. These methods do need some setup. There is documentation and accounts online of the correct setup of xtrabackup-v2. Rsync is the default because of the minimal setup. I will be adding USE flags for these deps soon. If this patch reliably solves the issue, consider opening a bug with MariaDB at https://mariadb.atlassian.net/ and link it here. The MariaDB project is very responsive.
@patch I can't tell if the patch works reliably in all scenarios. it basically just skips 2 checks for the rsync deamon. the 'is it up check' via lsof might be pointless as the pid is checked afterwards anyway. however removing lsof would shorten the runtime of the (now omitted) function, which might lead to the possibility of the pid being there but rsync not yet listening under heavy load ( if rsync writes its pid before listening. i don't know about that), which in turn might result in unexpected behavior from mysql, if it does not expect this (which i also don't know). if so however, the risk could be reduced to near zero if the sleep in the until loop is changed from 'sleep 0.2' to 'sleep 2' sleep 0.2 makes little sense in the original script anyway, because lsof takes several seconds to complete. the 'is another program than our rsync listening on the rsync port' check (which relies on the lsof output from before) makes sense though. as i did not see any other method of implementing these checks under uid mysql (feel free to correct me here), i just omitted them because their absence does not affect my (and probably most) setups. but i doubt that the mariadb folks will remove them just for the sake of grsecurity and docker users. so hardened gentoo users should be aware of this and either not use rsync or use the provided patch. @xtrabackup-v2 unfortunately i will not be able to try this for the next month as i am on vacation, but i will let you know once i get to it and it doesn't work either. :) meanwhile, thank you for the great work on this package!
I have added 2 new USE flags to pull in the dependencies for the sst methods. This should make things easier. sst-rsync is enabled by default as that is what galera expects by default
I switched to xtrabackup-v2. # echo "dev-db/mariadb-galera sst-xtrabackup" >> /etc/portage/package.use # emerge dev-db/mariadb-galera results in a circular dependency. (xtrabackup's perl modules require mysql) i worked around this by emerging dev-db/mariadb-galera, then set the use flag, and reemerge it again. Is there another way? --- xtrabackup-v2 seems to work more reliably than rsync, but also fails to sync every so often. I noticed that the default my.cnf file has unreasonable defaults for a galera cluster, which ( in case of innodb_log_file_size) might be the cause: ### unknown option '--loose-federated' #loose-federated ### will be deprecated #innodb_additional_mem_pool_size = 2M ### something like 2G might be more appropriate as sane default in 2015 innodb_buffer_pool_size = 16M ### 512M might be a sane default, i use 2G, see https://stackoverflow.com/questions/23966539/percona-xtradbcluster-error-while-getting-data-from-donor-node innodb_log_file_size = 5M In one case, i had to reinitialize a cluster for the nodes to rejoin. I will let you know if changing the innodb_log_file_size fixes my issues.
Package removed