Bug 440234

Summary:	slapd init script crashes on grsec-kernel (openldap 2.4.30)
Product:	Gentoo Linux	Reporter:	bogdan <colegu>
Component:	Current packages	Assignee:	Gentoo LDAP project <ldap-bugs>
Status:	RESOLVED WORKSFORME
Severity:	normal	CC:	hardened-kernel+disabled, hardened, pageexec, spender
Priority:	Normal
Version:	unspecified
Hardware:	AMD64
OS:	Linux
Whiteboard:
Package list:		Runtime testing required:	---
Attachments:	relevant grsec options in the .config file emerge --info

Description bogdan 2012-10-30 09:04:42 UTC

if /etc/init.d/slapd restart is invoked, following relevant kern.log occurs

Oct 29 21:11:21 blaaa kernel: grsec: bruteforce prevention initiated against uid 439, banning for 15 minutes
Oct 29 21:11:25 blaaa kernel: grsec: From x.x.x.x: Segmentation fault occurred at 000000000000017d in /usr/lib64/openldap/slapd[slapd:14618] uid/euid:0/0 gid/egid:439/
439, parent /sbin/rc[start-stop-daem:14617] uid/euid:0/0 gid/egid:0/0

consequently, slapd will become unavailable for 15 minutes.
If slapd is started normally, without invoking the init script then everything works just fine.

Comment 1 Anthony Basile gentoo-dev

2012-11-02 03:36:55 UTC

Can you give my your emerge --info, your kernel .config, as well as the version of openldap, and what USE flags you used to compile it.

As a work around, you can disable CONFIG_GRKERNSEC_BRUTE, but we should get this working right.

Comment 2 bogdan 2012-11-02 08:59:02 UTC

Created attachment 328044 [details]
relevant grsec options in the .config file

Comment 3 bogdan 2012-11-02 08:59:19 UTC

Created attachment 328046 [details]
emerge --info

Comment 4 bogdan 2012-11-02 09:00:30 UTC

(In reply to comment #1)
> Can you give my your emerge --info, your kernel .config, as well as the
> version of openldap, and what USE flags you used to compile it.
> 
> As a work around, you can disable CONFIG_GRKERNSEC_BRUTE, but we should get
> this working right.

As already stated, the workaround is to start the slapd (openldap-2.4.30) daemon directly from console, not from the init script. There is no point in disabling the feature if things can work. 
All other services that also use the start-stop-daemon to start/stop them up work fine - apache, postfix, dovecot, syslog, ntpd, postgres etc...

I'll attach emerge --info as well as the relevant grsec options enabled in the kernel.
 Thanx alot for your help

bogdan

Comment 5 Anthony Basile gentoo-dev

2012-11-02 10:24:08 UTC

I'm not able to reproduce this, but I'm also not surprised.

What use flags did you use when compiling openldap? Also, is your directory large? Or how might it differ from just a fresh install, which is how I tried to reproduce.

I'm cc-ing upstream as they might have a clue.

Comment 6 bogdan 2012-11-02 10:48:33 UTC

(In reply to comment #5)
> I'm not able to reproduce this, but I'm also not surprised.
> 
> What use flags did you use when compiling openldap? Also, is your directory
> large? Or how might it differ from just a fresh install, which is how I
> tried to reproduce.
> 
> I'm cc-ing upstream as they might have a clue.

It might be a problem with the USE flags, actually I didn't payed to much attention when I emerged it, as there are a bunch of use flags there I actually don't use.  The directory is rather small, maybe 400K  - couple of hundreds users and some mailing lists and so on. 
However, I don't suspect a problem with ldap itself, even it differs from the fresh install, since when I start it from console it just works. I might be wrong, though. 
I have to add that I use the redhat automount schema, which is not in the default install. But problem occurs even if I comment it out. 

[ebuild   R    ] net-nds/openldap-2.4.30  USE="berkdb crypt cxx gnutls iodbc kerberos odbc overlays perl sasl slp ssl syslog tcpd -debug -experimental -icu -ipv6 -minimal -samba (-selinux) -smbkrb5passwd" 0 kB

Thanx for your help
bogdan

Comment 7 PaX Team 2012-11-02 12:41:52 UTC

what's the difference between starting slapd from the commandline directly vs. the init script? based on that difference, you should be able to reproduce the segfault when running the daemon from gdb and then you can debug the segfault (in particular, we'd need a backtrace, disasm, register context, etc). also can you check what happens under a vanilla kernel?

Comment 8 bogdan 2012-11-02 16:01:17 UTC

(In reply to comment #7)
> what's the difference between starting slapd from the commandline directly
> vs. the init script? based on that difference, you should be able to
> reproduce the segfault when running the daemon from gdb and then you can
> debug the segfault (in particular, we'd need a backtrace, disasm, register
> context, etc). also can you check what happens under a vanilla kernel?

As stated above, when running from command line slapd just works. If I kill the process and start it again, I don't get no bruteforce grsec error or whatever. When I run it from the init script it prints the errors mentioned earlier. I presume that under a vanilla kernel it will just work, because it works also under a grsec kernel. 

If I issue /etc/init.d/slapd start
the program starts (as expected)
if I issue /etc/init.d/slapd stop
program stop
if, within 15 minutes timeframe I want to start again the openldap daemon
I get the errors
bruteforce prevention initiated against uid 439, banning for 15 minutes
Segmentation fault occurred at 000000000000017d in /usr/lib64/openldap/slapd[slapd:14618] uid/euid:0/0 gid/egid:439/
439, parent /sbin/rc[start-stop-daem:14617] uid/euid:0/0 gid/egid:0/0
I I wait 15 minutes, program can be started again. 

if, within those 15 minutes I start the program from console, invoking slapd with all the arguments and so on, the program starts.

Comment 9 PaX Team 2012-11-02 16:35:17 UTC

(In reply to comment #8)
> if, within those 15 minutes I start the program from console, invoking slapd
> with all the arguments and so on, the program starts.

this is what's important: is there a difference between the command line params/etc issued by the startup script vs. what you do by hand? there must be because you said that whatever the startup script does will make slapd to segfault wheres your manually constructed command line does not. once we have that difference, you should be able to provoke the segfault from the command line as well and debug the whole thing under gdb. alternatively, you could also enable coredumping (ulimit -s unlimited) and analyze the coredump.

Comment 10 bogdan 2012-11-02 16:57:50 UTC

(In reply to comment #9)
> (In reply to comment #8)
> > if, within those 15 minutes I start the program from console, invoking slapd
> > with all the arguments and so on, the program starts.
> 
> this is what's important: is there a difference between the command line
> params/etc issued by the startup script vs. what you do by hand? there must
> be because you said that whatever the startup script does will make slapd to
> segfault wheres your manually constructed command line does not. once we
> have that difference, you should be able to provoke the segfault from the
> command line as well and debug the whole thing under gdb. alternatively, you
> could also enable coredumping (ulimit -s unlimited) and analyze the coredump.

Manually I start lapd like this 
  /usr/lib64/openldap/slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps:// 

Within the init script, the relevant line looks like this 
 
eval start-stop-daemon --start --pidfile /var/run/openldap/${SVCNAME}.pid --exec /usr/lib64/openldap/slapd -- -u ldap -g ldap "${OPTS}"

I'll try to reproduce the error on another machine, since that is a production server and I don't really want to keep it offline since it works (as long as the slapd service is not restarted from the init script, and it's not)

Thanx for your help
bogdan

Comment 11 bogdan 2012-11-02 19:39:50 UTC

this is incredibly strange. I can't reproduce the bug. installed same kernel on my laptop, same ldap driectory, same config file, same everything except userland and so on, and I can't reproduce the bug either...

Comment 12 Matthew Thode ( prometheanfire ) archtester

2012-11-02 19:42:14 UTC

I'm guessing it's the load on the ldap server that is causing the slapd to be blacklisted by grsec.  Doubt you can switch everyone to point at your laptop, but that would be one way to test.

Comment 13 bogdan 2012-11-02 20:05:17 UTC

I can set up another ldap server as a short backup, but I'll do that on monday, since this is really not an urgent matter.

Thanx alot guys, 
bogdan

Comment 14 bogdan 2012-11-05 14:56:48 UTC

Hi guys,

Some updates.
I've emerged  openldap-2.4.31-r1. starting program from /etc/init.d/slapd start gives segfault. From console, invoked with  /usr/lib64/openldap/slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps:// also seg faults. Same thing happens now with the stable openldap (2.4.30) - it segfaults regardless of how I start the program (so, it seems that what I stated in my previous posts does not hold anymore - maybe I didn't pay attention there at the 15 minutes ban interval, and invoked the program from console outside this timeframe). Emerged an older version of the program - openldap 2.4.19 - and everything works. Starting from /etc/init.d/slapd, stopping, restarting, etc... From console, the same, 2.4.19 works - not a single  error line in /var/log/grsec.log 

Now,  some debugging (I've done pretty much none gdb debugging so far)

(gdb) set args -d -1 -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps://
(gdb) run
(alot of rubish)
ldap_msgfree
5097c631 Could not set real user id to 439

Program received signal SIGSEGV, Segmentation fault.
0x00000339b8aa7fe4 in ?? ()
(gdb) backtrace
#0  0x00000339b8aa7fe4 in ?? ()
#1  0x00000339b6c34e20 in ?? ()
#2  0x000000000000001d in ?? ()
#3  0x26b2099e755bdd00 in ?? ()
#4  0x000003b43b00ee70 in ?? ()
#5  0x000000000000001d in ?? ()
#6  0x000003b43b00ee70 in ?? ()
#7  0x00000339ba478b3c in ?? () from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: previous frame inner to this frame (corrupt stack?)


strace give the same thing
setuid(439)                             = -1 EPERM (Operation not permitted)
sendto(3, "<167>Nov  5 15:54:11 slapd[8845]"..., 69, MSG_NOSIGNAL, NULL, 0) = 69
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x17d} ---
+++ killed by SIGSEGV +++
Segmentation fault

and the relevant grsec.log

Nov  5 14:49:09 blaaa kernel: grsec: From x.x.x.x: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/lib64/openldap/slapd[slapd:32283] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:25046] uid/euid:0/0 gid/egid:0/0
Nov  5 14:49:52 blaaa kernel: grsec: From x.x.x.x: Segmentation fault occurred at 000000000000017d in /usr/lib64/openldap/slapd[slapd:32294] uid/euid:0/0 gid/egid:55/55, parent /usr/bin/gdb[gdb:32288] uid/euid:0/0 gid/egid:0/0

Comment 15 bogdan 2012-11-05 21:27:45 UTC

So, it seems that the problem was fixed. I reduced the USE flags and compile openldap again.  I left the minimal berkdb, ssl, syslog and tcpd, because basically this is what I need - but I was to lazy to strip down the USE files from the begining since I'm no fan in editing alot of entries in /etc/portage/package.use (even like this, the system is pretty much slim, without many unneeded stuff)

What I can say so far, as far as my knowledge permits, is that definitely this was not a grsec/pax/hardened BUG! Most probably the bruteforce activated because during strace
these lines appear when the program is invoked witihn the 15 minutes ban timeframe

connect(8, {sa_family=AF_INET, sin_port=htons(636), sin_addr=inet_addr("server_ip.x.x.x")}, 16) = -1 EINPROGRESS (Operation now in progress)
poll([{fd=8, events=POLLOUT|POLLERR|POLLHUP}], 1, 120000) = 1 ([{fd=8, revents=POLLOUT|POLLERR|POLLHUP}])
getpeername(8, 0x3a09bc67650, [16])     = -1 ENOTCONN (Transport endpoint is not connected)
read(8, 0x3a09bc6764f, 1)               = -1 ECONNREFUSED (Connection refused)
shutdown(8, SHUT_RDWR)                  = -1 ENOTCONN (Transport endpoint is not connected

It seems that slapd wants to read the config from /etc/ldap.conf and tries to bind to that address - at least this is how I interpret it - but that's the client configuration not the server one.  I've checked and double checked the config files and nothing seems strange there. What bothers me is that when I compiled openldap with the same use flags as before, on my laptop, on the same kernel I didn't received anything strange. 
And even here people were not able to reproduce the bug.  Maybe there's a stale config file on my server that the the slapd _really_ wants to read, and it's not there, and, after the reducing of the use flags that file is not needed anymore. But why after the 15 min ban frame that file was not needed? That's strange....
Luckly I solved the problem, now it works, with 2.4.30 version. 
So, as an advice to the people who might experience the same issues like me - keep your use flags as low as you can. This might save you from alot of time. 

Many thanx !

Comment 16 Tully Gray 2012-11-07 01:42:54 UTC

I've seen very similar behaviour with slapd, however this bug no longer occurs on my system so I haven't looked into it any further.