Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 648356 - app-admin/restart-services-0.14.1 always restarts postgresql if pg uses hugepages
Summary: app-admin/restart-services-0.14.1 always restarts postgresql if pg uses hugep...
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Marc Schiffbauer
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-02-21 09:57 UTC by Marcin Mirosław
Modified: 2018-11-19 10:34 UTC (History)
2 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcin Mirosław 2018-02-21 09:57:33 UTC
restart-services incorrectly restarts postgresql if postgresql uses hugepages. lib_users returns:
# lib_users -m
8366;/anon_hugepage;postgres: autovacuum launcher process
8360;/anon_hugepage;/usr/lib64/postgresql-9.5/bin/postgres -D /etc/postgresql-9.5 --data-directory=/dane/bazy/postgresql/9.5/ --unix-socket-directories=/run/postgresql
28294;/anon_hugepage;postgres: dovecot exim 127.0.0.1(7244) idle
8365;/anon_hugepage;postgres: wal writer process
8363;/anon_hugepage;postgres: checkpointer process
28295;/anon_hugepage;postgres: dovecot exim 127.0.0.1(7246) idle
8364;/anon_hugepage;postgres: writer process
28298;/anon_hugepage;postgres: dovecot exim 127.0.0.1(7250) idle
28297;/anon_hugepage;postgres: dovecot exim 127.0.0.1(7248) idle
Comment 1 Marc Schiffbauer gentoo-dev 2018-05-03 07:40:35 UTC
Hi Marcin,

thanks for reporting! Whats the output of lib_users without "-m"?
Comment 2 Marcin Mirosław 2018-05-03 09:44:44 UTC
Hi Marc!
This is output of lib_users:

12398,9415 "-su"
14670 "postgres: dovecot exim 127.0.0.1(23974) idle"
14671 "postgres: dovecot exim 127.0.0.1(23976) idle"
9427 "postgres: writer process"
14191 "/sbin/agetty 38400 tty2 linux"
14206 "/sbin/agetty 38400 tty1 linux --noclear"
14147 "/sbin/agetty 38400 tty4 linux"
22326 "ekg -i -u 1034266"
14179 "/sbin/agetty 38400 tty3 linux"
5059 "tmux"
3669 "/usr/sbin/openvpn --config /etc/openvpn/mejor-serwer.conf --writepid /var/run/openvpn.mejor-serwer.pid --daemon --setenv SVCNAME openvpn.mejor-serwer --cd /etc/openvpn --setenv PEER_DNS yes"
12368,20739,5066,5072,5080,5083,5086 "-bash"
9429 "postgres: autovacuum launcher process"
12388,9414 "su -"
9422 "/usr/lib64/postgresql-9.5/bin/postgres -D /etc/postgresql-9.5 --data-directory=/dane/bazy/postgresql/9.5/ --unix-socket-directories=/run/postgresql"
9428 "postgres: wal writer process"
5124 "irssi"
9426 "postgres: checkpointer process"
14131 "/sbin/agetty 38400 tty6 linux"
Comment 3 Marc Schiffbauer gentoo-dev 2018-10-09 21:54:49 UTC
Sorry for the delay. Is this still an issue for you?

I dont see anything unusual in lib_users output :-/

Please attach log of "restart-services -dl"
Comment 4 Marcin Mirosław 2018-10-10 07:06:24 UTC
Yes, nothing changed, problem still exists on my box. Please look at:
 # lsof  -n | grep postgres | grep -i del
postgres    462              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres    462              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres    982              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres    982              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres   1001              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres   1001              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres  15687              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres  15687              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres  15690              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres  15690              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres  15691              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres  15691              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres  15692              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres  15692              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1
postgres  15693              postgres  DEL       REG               0,14            189137919 /anon_hugepage
postgres  15693              postgres  DEL       REG                0,5               557056 /SYSV0052e2c1

maybe this triggers restart of postgresql?
Comment 5 Marcin Mirosław 2018-10-10 07:09:12 UTC
And this is what I get if I disable hugepages in postgres:

# lsof  -n | grep postgres | grep -i del
postgres   9667              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9667              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
postgres   9670              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9670              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
postgres   9671              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9671              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
postgres   9672              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9672              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
postgres   9673              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9673              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
postgres   9978              postgres  DEL       REG                0,5            189348486 /dev/zero
postgres   9978              postgres  DEL       REG                0,5               589824 /SYSV0052e2c1
# restart-services
-> Searching for services that need to be restarted ... done
=> sshd: Not restarting unclassified service (requires -u) ...


Now is OK.
Comment 6 Marc Schiffbauer gentoo-dev 2018-10-10 07:43:15 UTC
Maybe I need to add another excpetion for postgres here. Can you please attach the debug output (restart-services -ld) when huge_pages is enabled? TIA
Comment 7 Marcin Mirosław 2018-10-10 07:47:31 UTC
Here it is:

# restart-services -ld
debug: loading config file: /etc/restart-services.conf
debug: loading config file: /etc/restart-services.d/00-local.conf
-> Searching for services that need to be restarted ...  debug: analyzing lib_users output ...
Warning: Some files could not be read. Note that lib_users has to be run as
root to get a full list of deleted in-use libraries.
debug: checking pid 1575
debug: lu_pid set to its ppid: 1575
debug: process root is /
debug: no direct hit: process "/usr/lib64/postgresql-9.5/bin/postgres -D /etc/postgresql-9.5 --data-directory=/dane/bazy/postgresql/9.5/ --unix-socket-directories=/run/postgresql" (1575) has n
o RC_SERVICE env set, adding to TODO_PROCESSES
debug: checking pid 1591
debug: lu_pid set to its ppid: 1575
debug: skipping /usr/lib/postgresql-9.5/bin/postgres with pid 1575: we already found that ppid
debug: checking pid 1592
debug: lu_pid set to its ppid: 1575
debug: skipping /usr/lib/postgresql-9.5/bin/postgres with pid 1575: we already found that ppid
debug: checking pid 1593
debug: lu_pid set to its ppid: 1575
debug: skipping /usr/lib/postgresql-9.5/bin/postgres with pid 1575: we already found that ppid
debug: checking pid 1594
debug: lu_pid set to its ppid: 1575
debug: skipping /usr/lib/postgresql-9.5/bin/postgres with pid 1575: we already found that ppid
debug: checking pid 3766
debug: lu_pid set to its ppid: 3766
debug: process root is /
debug: no direct hit: process "ssh: /home/marcin/.ssh/sockets/root@10.10.10.33-22 [mux]" (3766) has no RC_SERVICE env set, adding to TODO_PROCESSES
debug: checking pid 7087
debug: lu_pid set to its ppid: 6014
debug: process root is /
debug: no direct hit: process "irssi" (6014) has no RC_SERVICE env set, adding to TODO_PROCESSES
debug: checking pid 7184
debug: lu_pid set to its ppid: 6014
debug: skipping /usr/bin/tmux with pid 6014: we already found that ppid
debug: checking pid 7272
debug: lu_pid set to its ppid: 6014
debug: skipping /usr/bin/tmux with pid 6014: we already found that ppid
debug: checking pid 7672
debug: lu_pid set to its ppid: 6014
debug: skipping /usr/bin/tmux with pid 6014: we already found that ppid
debug: checking pid 25708
debug: lu_pid set to its ppid: 25708
debug: process root is /
debug: no direct hit: process "ssh: /home/marcin/.ssh/sockets/root@eos.sprawy.it-22 [mux]" (25708) has no RC_SERVICE env set, adding to TODO_PROCESSES
debug: analyzing remaining processes (not direct hits) ...
debug: TODO_PROCESSES_EXE: /usr/bin/tmux /usr/bin/ssh /usr/lib/postgresql-9.5/bin/postgres /usr/bin/ssh
debug: checking exe /usr/bin/tmux (pid 6014)
debug: pid 6014 does not look like a script
debug: found package: app-misc/tmux
debug: no init scripts found.
debug: adding to UNKNOWN_PROCESSES: /usr/bin/tmux (6014)
debug: checking exe /usr/bin/ssh (pid 3766)
debug: pid 3766 does not look like a script
debug: found package: net-misc/openssh
debug: found init scripts: /etc/init.d/sshd
debug: found started service, adding init script: /etc/init.d/sshd
debug: checking exe /usr/lib/postgresql-9.5/bin/postgres (pid 1575)
debug: pid 1575 does not look like a script
debug: found package: dev-db/postgresql
debug: found init scripts: /etc/init.d/postgresql-10 /etc/init.d/postgresql-9.5
debug: no started service found.
debug: found started service, adding init script: /etc/init.d/postgresql-9.5
debug: checking exe /usr/bin/ssh (pid 25708)
debug: pid 25708 does not look like a script
debug: found package: net-misc/openssh
debug: found init scripts: /etc/init.d/sshd
debug: found started service, adding init script: /etc/init.d/sshd
done
=> Found 2 services that need to be restarted or reloaded
=-> unclassified services (2): postgresql-9.5 sshd
=> Found 0 inittab processes that need to be restarted
debug: looking for false positives in unknown processes
debug: looking at unknown process "irssi" (6014)

=> Warning: The following running processes did not match any init script and have been updated or deleted:
"irssi" (6014)
Comment 8 Marc Schiffbauer gentoo-dev 2018-11-19 09:11:22 UTC
Hi Marcin,

i could now reproduce this here in my machine. Unfortunately this seems to be a bug in lib_users which reports "/anon_hugepage" as a deleted library.

Postgres with "huge_pages = on":

~ # lib_users -m|grep postgres
30014;/anon_hugepage;postgres: wal writer process
30013;/anon_hugepage;postgres: writer process
30008;/anon_hugepage;/usr/lib64/postgresql-9.6/bin/postgres -p 5432 -D /var/lib/postgresql/9.6/data
30012;/anon_hugepage;postgres: checkpointer process
30015;/anon_hugepage;postgres: autovacuum launcher process

and with "huge_pages = off", there is no output for postgres at all.

Lets see what Tobias thinks about this.
Comment 9 Tobias Klausmann (RETIRED) gentoo-dev 2018-11-19 09:19:17 UTC
Fixed in git:

https://github.com/klausman/lib_users/commit/0586f1605aed0f56a957032dda704a43d40f6bf9

This will be in the next release. Until then, lib_users can be made to ignore /anon_hugepage by using the -I commandline option.
Comment 10 Larry the Git Cow gentoo-dev 2018-11-19 09:25:57 UTC
The bug has been referenced in the following commit(s):

https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=3a03a4a7ea282a7fd9da74be187b15826fb654c2

commit 3a03a4a7ea282a7fd9da74be187b15826fb654c2
Author:     Tobias Klausmann <klausman@gentoo.org>
AuthorDate: 2018-11-19 09:25:34 +0000
Commit:     Tobias Klausmann <klausman@gentoo.org>
CommitDate: 2018-11-19 09:25:34 +0000

    app-admin/lib_users: Add v0.13, drop v0.12
    
    Bug: https://bugs.gentoo.org/648356
    Package-Manager: Portage-2.3.51, Repoman-2.3.12
    Signed-off-by: Tobias Klausmann <klausman@gentoo.org>

 app-admin/lib_users/Manifest                                         | 2 +-
 app-admin/lib_users/{lib_users-0.12.ebuild => lib_users-0.13.ebuild} | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
Comment 11 Marc Schiffbauer gentoo-dev 2018-11-19 10:29:52 UTC
(In reply to Tobias Klausmann from comment #9)
> Fixed in git:
> 
> https://github.com/klausman/lib_users/commit/
> 0586f1605aed0f56a957032dda704a43d40f6bf9
> 
> This will be in the next release. Until then, lib_users can be made to
> ignore /anon_hugepage by using the -I commandline option.

Thanks Tobias for this and the new release ;-)

I added "-I /anon_hugepage" to lib_users call in restart-services 0.14.2 which I will remove again in a later version when lib_users <0.13 should have vanished from most systems

And thanks to Marcin for reporting!
Comment 12 Marcin Mirosław 2018-11-19 10:34:00 UTC
Quick fix! Thank you.