Summary: | slapd init script crashes on grsec-kernel (openldap 2.4.30) | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | bogdan <colegu> |
Component: | Current packages | Assignee: | Gentoo LDAP project <ldap-bugs> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | hardened-kernel+disabled, hardened, pageexec, spender |
Priority: | Normal | ||
Version: | unspecified | ||
Hardware: | AMD64 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: |
relevant grsec options in the .config file
emerge --info |
Description
bogdan
2012-10-30 09:04:42 UTC
Can you give my your emerge --info, your kernel .config, as well as the version of openldap, and what USE flags you used to compile it. As a work around, you can disable CONFIG_GRKERNSEC_BRUTE, but we should get this working right. Created attachment 328044 [details]
relevant grsec options in the .config file
Created attachment 328046 [details]
emerge --info
(In reply to comment #1) > Can you give my your emerge --info, your kernel .config, as well as the > version of openldap, and what USE flags you used to compile it. > > As a work around, you can disable CONFIG_GRKERNSEC_BRUTE, but we should get > this working right. As already stated, the workaround is to start the slapd (openldap-2.4.30) daemon directly from console, not from the init script. There is no point in disabling the feature if things can work. All other services that also use the start-stop-daemon to start/stop them up work fine - apache, postfix, dovecot, syslog, ntpd, postgres etc... I'll attach emerge --info as well as the relevant grsec options enabled in the kernel. Thanx alot for your help bogdan I'm not able to reproduce this, but I'm also not surprised. What use flags did you use when compiling openldap? Also, is your directory large? Or how might it differ from just a fresh install, which is how I tried to reproduce. I'm cc-ing upstream as they might have a clue. (In reply to comment #5) > I'm not able to reproduce this, but I'm also not surprised. > > What use flags did you use when compiling openldap? Also, is your directory > large? Or how might it differ from just a fresh install, which is how I > tried to reproduce. > > I'm cc-ing upstream as they might have a clue. It might be a problem with the USE flags, actually I didn't payed to much attention when I emerged it, as there are a bunch of use flags there I actually don't use. The directory is rather small, maybe 400K - couple of hundreds users and some mailing lists and so on. However, I don't suspect a problem with ldap itself, even it differs from the fresh install, since when I start it from console it just works. I might be wrong, though. I have to add that I use the redhat automount schema, which is not in the default install. But problem occurs even if I comment it out. [ebuild R ] net-nds/openldap-2.4.30 USE="berkdb crypt cxx gnutls iodbc kerberos odbc overlays perl sasl slp ssl syslog tcpd -debug -experimental -icu -ipv6 -minimal -samba (-selinux) -smbkrb5passwd" 0 kB Thanx for your help bogdan what's the difference between starting slapd from the commandline directly vs. the init script? based on that difference, you should be able to reproduce the segfault when running the daemon from gdb and then you can debug the segfault (in particular, we'd need a backtrace, disasm, register context, etc). also can you check what happens under a vanilla kernel? (In reply to comment #7) > what's the difference between starting slapd from the commandline directly > vs. the init script? based on that difference, you should be able to > reproduce the segfault when running the daemon from gdb and then you can > debug the segfault (in particular, we'd need a backtrace, disasm, register > context, etc). also can you check what happens under a vanilla kernel? As stated above, when running from command line slapd just works. If I kill the process and start it again, I don't get no bruteforce grsec error or whatever. When I run it from the init script it prints the errors mentioned earlier. I presume that under a vanilla kernel it will just work, because it works also under a grsec kernel. If I issue /etc/init.d/slapd start the program starts (as expected) if I issue /etc/init.d/slapd stop program stop if, within 15 minutes timeframe I want to start again the openldap daemon I get the errors bruteforce prevention initiated against uid 439, banning for 15 minutes Segmentation fault occurred at 000000000000017d in /usr/lib64/openldap/slapd[slapd:14618] uid/euid:0/0 gid/egid:439/ 439, parent /sbin/rc[start-stop-daem:14617] uid/euid:0/0 gid/egid:0/0 I I wait 15 minutes, program can be started again. if, within those 15 minutes I start the program from console, invoking slapd with all the arguments and so on, the program starts. (In reply to comment #8) > if, within those 15 minutes I start the program from console, invoking slapd > with all the arguments and so on, the program starts. this is what's important: is there a difference between the command line params/etc issued by the startup script vs. what you do by hand? there must be because you said that whatever the startup script does will make slapd to segfault wheres your manually constructed command line does not. once we have that difference, you should be able to provoke the segfault from the command line as well and debug the whole thing under gdb. alternatively, you could also enable coredumping (ulimit -s unlimited) and analyze the coredump. (In reply to comment #9) > (In reply to comment #8) > > if, within those 15 minutes I start the program from console, invoking slapd > > with all the arguments and so on, the program starts. > > this is what's important: is there a difference between the command line > params/etc issued by the startup script vs. what you do by hand? there must > be because you said that whatever the startup script does will make slapd to > segfault wheres your manually constructed command line does not. once we > have that difference, you should be able to provoke the segfault from the > command line as well and debug the whole thing under gdb. alternatively, you > could also enable coredumping (ulimit -s unlimited) and analyze the coredump. Manually I start lapd like this /usr/lib64/openldap/slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps:// Within the init script, the relevant line looks like this eval start-stop-daemon --start --pidfile /var/run/openldap/${SVCNAME}.pid --exec /usr/lib64/openldap/slapd -- -u ldap -g ldap "${OPTS}" I'll try to reproduce the error on another machine, since that is a production server and I don't really want to keep it offline since it works (as long as the slapd service is not restarted from the init script, and it's not) Thanx for your help bogdan this is incredibly strange. I can't reproduce the bug. installed same kernel on my laptop, same ldap driectory, same config file, same everything except userland and so on, and I can't reproduce the bug either... I'm guessing it's the load on the ldap server that is causing the slapd to be blacklisted by grsec. Doubt you can switch everyone to point at your laptop, but that would be one way to test. I can set up another ldap server as a short backup, but I'll do that on monday, since this is really not an urgent matter. Thanx alot guys, bogdan Hi guys, Some updates. I've emerged openldap-2.4.31-r1. starting program from /etc/init.d/slapd start gives segfault. From console, invoked with /usr/lib64/openldap/slapd -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps:// also seg faults. Same thing happens now with the stable openldap (2.4.30) - it segfaults regardless of how I start the program (so, it seems that what I stated in my previous posts does not hold anymore - maybe I didn't pay attention there at the 15 minutes ban interval, and invoked the program from console outside this timeframe). Emerged an older version of the program - openldap 2.4.19 - and everything works. Starting from /etc/init.d/slapd, stopping, restarting, etc... From console, the same, 2.4.19 works - not a single error line in /var/log/grsec.log Now, some debugging (I've done pretty much none gdb debugging so far) (gdb) set args -d -1 -u ldap -g ldap -f /etc/openldap/slapd.conf -h ldaps:// (gdb) run (alot of rubish) ldap_msgfree 5097c631 Could not set real user id to 439 Program received signal SIGSEGV, Segmentation fault. 0x00000339b8aa7fe4 in ?? () (gdb) backtrace #0 0x00000339b8aa7fe4 in ?? () #1 0x00000339b6c34e20 in ?? () #2 0x000000000000001d in ?? () #3 0x26b2099e755bdd00 in ?? () #4 0x000003b43b00ee70 in ?? () #5 0x000000000000001d in ?? () #6 0x000003b43b00ee70 in ?? () #7 0x00000339ba478b3c in ?? () from /lib64/ld-linux-x86-64.so.2 Backtrace stopped: previous frame inner to this frame (corrupt stack?) strace give the same thing setuid(439) = -1 EPERM (Operation not permitted) sendto(3, "<167>Nov 5 15:54:11 slapd[8845]"..., 69, MSG_NOSIGNAL, NULL, 0) = 69 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x17d} --- +++ killed by SIGSEGV +++ Segmentation fault and the relevant grsec.log Nov 5 14:49:09 blaaa kernel: grsec: From x.x.x.x: denied resource overstep by requesting 4096 for RLIMIT_CORE against limit 0 for /usr/lib64/openldap/slapd[slapd:32283] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:25046] uid/euid:0/0 gid/egid:0/0 Nov 5 14:49:52 blaaa kernel: grsec: From x.x.x.x: Segmentation fault occurred at 000000000000017d in /usr/lib64/openldap/slapd[slapd:32294] uid/euid:0/0 gid/egid:55/55, parent /usr/bin/gdb[gdb:32288] uid/euid:0/0 gid/egid:0/0 So, it seems that the problem was fixed. I reduced the USE flags and compile openldap again. I left the minimal berkdb, ssl, syslog and tcpd, because basically this is what I need - but I was to lazy to strip down the USE files from the begining since I'm no fan in editing alot of entries in /etc/portage/package.use (even like this, the system is pretty much slim, without many unneeded stuff) What I can say so far, as far as my knowledge permits, is that definitely this was not a grsec/pax/hardened BUG! Most probably the bruteforce activated because during strace these lines appear when the program is invoked witihn the 15 minutes ban timeframe connect(8, {sa_family=AF_INET, sin_port=htons(636), sin_addr=inet_addr("server_ip.x.x.x")}, 16) = -1 EINPROGRESS (Operation now in progress) poll([{fd=8, events=POLLOUT|POLLERR|POLLHUP}], 1, 120000) = 1 ([{fd=8, revents=POLLOUT|POLLERR|POLLHUP}]) getpeername(8, 0x3a09bc67650, [16]) = -1 ENOTCONN (Transport endpoint is not connected) read(8, 0x3a09bc6764f, 1) = -1 ECONNREFUSED (Connection refused) shutdown(8, SHUT_RDWR) = -1 ENOTCONN (Transport endpoint is not connected It seems that slapd wants to read the config from /etc/ldap.conf and tries to bind to that address - at least this is how I interpret it - but that's the client configuration not the server one. I've checked and double checked the config files and nothing seems strange there. What bothers me is that when I compiled openldap with the same use flags as before, on my laptop, on the same kernel I didn't received anything strange. And even here people were not able to reproduce the bug. Maybe there's a stale config file on my server that the the slapd _really_ wants to read, and it's not there, and, after the reducing of the use flags that file is not needed anymore. But why after the 15 min ban frame that file was not needed? That's strange.... Luckly I solved the problem, now it works, with 2.4.30 version. So, as an advice to the people who might experience the same issues like me - keep your use flags as low as you can. This might save you from alot of time. Many thanx ! I've seen very similar behaviour with slapd, however this bug no longer occurs on my system so I haven't looked into it any further. |