After updating one of the Galera cluster nodes from mariadb-10.2.22 to mariadb-10.2.29, the systemd service fails to start. Here's what I see in the logs: sh[4624]: /usr/bin/galera_recovery: line 71: /tmp/wsrep_recovery.Y7g0X2: Permission denied systemd[1]: mariadb.service: Control process exited, code=exited, status=1/FAILURE systemd[1]: mariadb.service: Failed with result 'exit-code'. systemd[1]: Failed to start MariaDB 10.2.29 database server. I diffed the service file and the galera_recovery script from the previous version and there don't seem to be any relevant differences. (Maybe this bug comes from upstream?) If MariaDB is not configured as a Galera cluster, it starts without problems. /tmp is 777, managed by systemd-tmpfiles. I was able to reproduce this bug in 2 different clusters (different gentoo installations) so it might be reproducible with this configuration. After re-emerging version 10.2.22, it starts correctly.
Can you please try 10.2.30?
(In reply to Tomáš Mózes from comment #1) > Can you please try 10.2.30? Hi Tomáš, there is no such version in portage. Are you suggesting a version bump?
It's probably enough to copy the 10.2.29 ebuild to your local overlay as 10.2.30.
(In reply to Tomáš Mózes from comment #3) > It's probably enough to copy the 10.2.29 ebuild to your local overlay as > 10.2.30. I made the version bump and nothing changed. So I tried to mangle the galera_recovery script until I got it working. First I commented out the lines with chown and chmod on the tmp file (105-106): # [ "$euid" = "0" ] && chown $user $log_file # chmod 700 $log_file Then I got a different error: WSREP: Failed to start mysqld for wsrep recovery: '/usr/bin/galera_recovery: line 71: ./sbin/mysqld: No such file or directory' So I changed in that line ./sbin/mysqld to mysqld That way it finally started correctly. (Of course with some security problems for the file permissions?) As I said before, the diff from the script from version 10.2.22 shows nothing like the changes I made so I still don't get why the new version doesn't work. BTW I think you could be able to try locally the script with a single-node cluster with the following my.cnf options: wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so (or refer to gentoo wiki galera cluster page) I'm not sure if a single-node cluster still triggers the script. I'll try in a few.
I currently use OpenRC, so cannot test right away :-/
I found out that what it seemed to me a trivial difference between the scripts, actually changing back print_defaults="/usr/libexec/mariadb/my_print_defaults" to print_defaults="/usr/bin/my_print_defaults" in the galera_recovery script, made things to work again. I don't know the difference between these two binaries, I just know the first come from dev-db/mariadb package and the second come from dev-db/mysql-connector-c. I leave this as a workaround for those who encounter this same problem, but I think the maintainer should have it checked. BTW I confirm what I said in my previous comment: just adding those two lines to my.cnf configuration will let you test the script also if it's a single node.
This problem persists in dev-db/mariadb-10.2.31
(thanks for the previous fix that kept things rolling for the last 4 months :D) No idea what changed in the latest patch but the ""fix"" to change the my_print_defaults doesn't seem to be working (now gives an error about --mysql parameter being bad) There's something "Broken" about the mktemp call here ; it seems to be mktemp its self that's returning the permission denied(??). My current (probably insecure on a shared machine) bodge (because I need my cluster to work) to make this work is to just change line 28 (and not do the previous stuff about changing my_print_defaults any more) log_file=/tmp/wsrep_recovery ; cd /usr It complains that mktemp failed (some check I didn't bother to decode) but at least it seems to start back up and rejoin the cluster now. Be nice to figure out whats actually broken here so I don't have to bodge my database startup every time a new release gets emerged but I'm a little busy so just dumping information here for now Also, before anyone panics, something about wsrep_cluster_size drops to 0 after the upgrade to 10.4 from 10.2, but eventually returns to whatever it should be for your cluster size. There's various other unanswered observations about this, and even when wsrep_cluster_size is zero changes seem to be replicating between the nodes. Just not the sort of additional panic I needed on top of already bodging things :P
JFYI: At the moment I am the only one still active, left in Gentoo's mysql project. I don't use systemd and I don't use galera cluster. So if you are waiting for "us" to do something, you will probably wait for a long time... sorry about that. Patches are welcome.
Not sure of the implications but this worked for me vi /usr/bin/galera_recovery #just commented out the safety checks entirely # Safety checks #if [ -n "$log_file" -a -f "$log_file" ]; then # [ "$euid" = "0" ] && chown $user $log_file # chmod 600 $log_file #else # log "WSREP: mktemp failed" #fi