Summary: | tpop3d crashes with "stack smashing attack in function connection_new()" when built with gcc-3.3.4-r1 | ||
---|---|---|---|
Product: | Gentoo Linux | Reporter: | Alexander Stoll <technoworx> |
Component: | [OLD] Server | Assignee: | The Gentoo Linux Hardened Team <hardened> |
Status: | RESOLVED OBSOLETE | ||
Severity: | critical | CC: | alex_y_xu, griffon26, kevquinn, vapier |
Priority: | High | ||
Version: | unspecified | ||
Hardware: | x86 | ||
OS: | Linux | ||
Whiteboard: | |||
Package list: | Runtime testing required: | --- | |
Attachments: | Removes the local blocks declared with the alloc_struct macro, thus avoiding a gcc-3.x SSP compiler bug |
Description
Alexander Stoll
2005-12-12 03:11:41 UTC
sorry for messing up the revision number of gcc in header... *grrr* short night... additional info via strace: open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) setsockopt(5, SOL_SOCKET, SO_SNDBUF, [8192], 4) = 0 fcntl64(5, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 getsockname(5, {sa_family=AF_INET, sin_port=htons(110), sin_addr=inet_addr("0.0 .0.0")}, [16]) = 0 open("/dev/urandom", O_RDONLY|O_LARGEFILE) = 6 read(6, " @-G\27\307\21<\357Hygq\364\324 ", 16) = 16 close(6) = 0 time(NULL) = 1134392024 write(5, "+OK <20402d4717c7113cef48796771f"..., 53) = 53 time(NULL) = 1134392024 rt_sigprocmask(SIG_BLOCK, ~[ABRT], NULL, 8) = 0 write(2, "tpop3d", 6tpop3d) = 6 write(2, ": stack smashing attack in funct"..., 36: stack smashing attack in fun ction ) = 36 write(2, "connection_new", 14connection_new) = 14 write(2, "()\n", 3() ) = 3 rt_sigaction(SIGABRT, {SIG_DFL}, NULL, 8) = 0 getpid() = 16811 kill(16811, SIGABRT) = 0 --- SIGABRT (Aborted) @ 0 (0) --- +++ killed by SIGABRT (core dumped) +++ BT is no help, because gdb has no symbols from external libs and a appropriate debug environment is not in place for more than only building the tpop3d binary with debug unformation and -pie ... Some other things I've noticed: - It occurs with -O2 and higher, but not if -fstrict-aliasing is in CFLAGS - Removing the call to make_timestamp() in connection_new() gets rid of the crash - If i add a char array of 113 or larger as the first local variable of make_timestamp(), the crash does not occur. If it's 112 bytes or smaller or if it is added at the end of the list of local vars, then it does occur. (In reply to comment #2) > - It occurs with -O2 and higher, but not if -fstrict-aliasing is in CFLAGS I can verify this on a few boxes, used FLAGS: -O3 -march=pentiumpro -fomit-frame-pointer -fstrict-aliasing -pipe -O3 -march=pentium3 -fomit-frame-pointer -fstrict-aliasing -pipe No crash... If nothing new comes up, please incorporate the CFLAG that circumvents this issue in an updated ebuild to restore proper function of compiled binaries. Maybe it would be another option to ask upstream to release 1.6.0 from the current cvs version, that is - according to mail list archive - already distributed by many others for fixing other bugs... If someone will provide an ebuild with cvs version, I will happily test it... Alexander, I'm not comfortable with adding -fstrict-aliasing, because there can be code that works without it, but not with it. And just because it doesn't crash, doesn't mean it does the right thing. Security people, I could really use some insight into this problem. (In reply to comment #5) > Alexander, I'm not comfortable with adding -fstrict-aliasing, because > there can be code that works without it, but not with it. And just because > it doesn't crash, doesn't mean it does the right thing. This is totally understood, I agree with you that this is the last and worst option that should be considered to keep the package working... I would be sleeping better if the cause of this bug is understood - and of course - fixed. I don (In reply to comment #5) > Alexander, I'm not comfortable with adding -fstrict-aliasing, because > there can be code that works without it, but not with it. And just because > it doesn't crash, doesn't mean it does the right thing. This is totally understood, I agree with you that this is the last and worst option that should be considered to keep the package working... I would be sleeping better if the cause of this bug is understood - and of course - fixed. I don´t want to bother or rush people, but this bug lurks around some time and I have the impression that the severity of this is underestimated. It may be not as widely deployed as apache, but it is identically exposed and people rely on pop3 service, so it deserves the same attention - my opinion. Even when I don´t have a debug environment handy, if the only measure to fix this soon is, that I provide a full backtrace, I will do the escessary work - and stop complaining - until it is available to the security herd... ;-) I hope my points are considered with benevolence, I don´t want to bug people... Security, please advice. connection.c:102:Bounds error: attempt to reference memory overrunning the end of an object. connection.c:102: Pointer value: 0x5d619f90, Size: 16 connection.c:102: Object `sinlocal': connection.c:102: Address in memory: 0x5d619f80 .. 0x5d619f8f connection.c:102: Size: 16 bytes connection.c:102: Element size: 1 bytes connection.c:102: Number of elements: 16 connection.c:102: Created at: netloop.c, line 125 connection.c:102: Storage class: stack quit: signal 6 post_fork = 0 Aborted Thats forcing -fno-striact-aliasing and with -fstack-protector -fbounds-checking enabled. Now with -fno-stack-protector -fno-strict-aliasing -fbounds-checking I cant make comment #7 happen again. So this almost appears to be a bug in ssp. Adding kevin quinn to the CC: list to ask him to take a look also. To quickly test, put these two lines in a tpop3d.conf: listen-address: 0.0.0.0:12345 auth-pam-enable: yes and start tpop3d with: ./tpop3d -f /path/to/tpop3d.conf -d Then connect to it: telnet localhost 12345 Hmm; definitely looks like a bug in SSP. First here's the preprocessed source at the start of connection_new: connection connection_new(int s, const struct sockaddr_in *sin, listener L) { int n; connection c = ((void *)0); do { struct _connection as__z = {0}; c = xmalloc(sizeof *c); *c = as__z; } while (0); ... Note the static initialiser '= {0}' is built by the compiler as a call to 'memset' on a block declared locally, that is then memcpy'd to the xmalloc'd pointer. Here's the assembler compiled with fno-stack-protector: connection_new: pushl %ebp movl $128, %eax movl %esp, %ebp pushl %edi pushl %esi pushl %ebx subl $588, %esp movl 12(%ebp), %ebx movl 8(%ebp), %edi movl %eax, 8(%esp) xorl %eax, %eax movl %eax, 4(%esp) leal -152(%ebp), %eax movl %eax, (%esp) call memset As we can see, the a frame of 588 bytes is setup, which includes the a stack buffer at 152 bytes below the frame pointer (ebp - stack pointer on entry). This is quite sufficient for the 128-byte _connection structure allocated with macro alloc_struct. However if we look at the output when built with -fstack-protector we get: connection_new: pushl %ebp movl %esp, %ebp leal -56(%ebp), %edx pushl %edi pushl %esi pushl %ebx subl $604, %esp movl __guard, %eax movl %edx, (%esp) movl 12(%ebp), %ebx movl 8(%ebp), %edi movl %eax, -40(%ebp) movl 16(%ebp), %eax movl %eax, -584(%ebp) movl $128, %eax movl %eax, 8(%esp) xorl %eax, %eax movl %eax, 4(%esp) call memset Things are a little more involved, but importantly the address passed to the memset call is only 56 bytes less than the frame pointer (see the leal instruction to edx), so when memset blats it with 128 (0x80) bytes of zeros, the guard gets zeroed along with a bunch of other stuff. This is clearly wrong. It seems the 3.x stack protector messes up references to buffers declared in local blocks while trying to reorder frame data to put buffers after scalars. Fixing the stack protector is another matter of course - but anything that can eliminate the buffers declared in local blocks may work. It's easy to patch the source to fix the problem and I'll attach a patch shortly that avoids the buffer in the local block, however this is a serious shortcoming of the 3.x stack protector. Incidentally, declaring stuff in local blocks is an extension to standard C so is probably relatively rarely used for C code, however it is standard C++. Perhaps this explains why we find SSP so unreliable on C++ code, but have relatively few problems with C code. I tried the original code with a 4.1.0 beta, which has a completely different stack-protector implementation, and this created code that is correct. However notably the 4.1.0 beta didn't reorder the data on the stack, which makes it weaker than the 3.x SSP implementation. So although the 4.1.0 stack protector generates working code, it's just not doing the part of the 3.x stack protector functionality that causes the problem in 3.x. Created attachment 79288 [details, diff]
Removes the local blocks declared with the alloc_struct macro, thus avoiding a gcc-3.x SSP compiler bug
This seems to work on the limited testing I've done here.
It also has added benefits; it reduces overall stack usage (for what that's worth :) ) and saves a block copy (memcpy) whenever the alloc_struct macro is used.
Fixing the GCC-3.x stack protector implementation itself is left as an exercise for the reader ;)
reassigned - it's not security problem, just a bug in the compiler. Maurice - up to you how to proceed; you could apply the patch, or you could filter fstack-protector. It might be worth seeing what upstream think of the patch. Not for the problem with SSP, but for code efficiency. Their code is possibly more portable (but not much - memset is standard in C90 at least), but the patched code is quicker and uses less stack. tpop3d's Chris Lightfoot wrote: Your implementation of alloc_struct is not valid. It assumes that all-bits-zero is a valid representation of any element of a struct on any architecture, and that that is the same representation as would be yielded by the = {0} declaration. There is no such guarantee in the C or POSIX standards, and while using memset(..., 0, ...) may work on your architecture it is not portable and therefore best avoided unless there is some compelling reason to use it. Kevin/hardened, what timeframe would we be talking about regarding a fix for the bug in SSP? I'll disable stack protector for tpop3d for now. I do assume the struct is contiguous, and had no idea '= {0}' could generate non-zero data.
> Kevin/hardened, what timeframe would we be talking about regarding a fix for
> the bug in SSP? I'll disable stack protector for tpop3d for now.
Well, upstream 3.x SSP author isn't responding to me so I can't say if we'll have any luck there. I've been learning gcc internals so hope to be able to fix it eventually, but it's a lot to learn.
In other words, don't hold your breath ;) I'll raise a separate bug to keep the SSP problem itself in mind so you can close this by filtering stack-protector; when you do so please add a comment referring to this bug so we know why the filtering is present.
Quick follow-up on the '={0}' thing; I had a look through the C90 standard to see how it could generate non-zero data, and it can occur on platforms where the null pointer is non-zero. The '0' in the initialiser is interpreted as a "null pointer constant" (always zero) for any pointer elements of the structure. The null pointer constant is converted into a null pointer - however the standard does not define the representation of a null pointer, leaving it to the implementation to choose a representation that it considers most sensible for the architecture. Relevant bits of ISO 9899-1999 are sections 6.7.8 paras 10 and 21, and for definition of 'null pointer constant see section 6.3.2.3 para 3. You can use this bug for SSP, because this report really is about that issue. Once this one is closed, I can remove filter-flags again from tpop3d. Has anyone tested this with GCC 4.1.1? For info - it looks fine when built with 4.1.2. It does re-order parameters (contrary to what I said before) so all in all it's a much better SSP implementation :) vapier - are you happy to have gcc-major-version() used to control the filtering according to compiler version, e.g.: [[ $(gcc-major-version) -eq 3 ]] && filter-flags -fstack-protector ? since we're looking at deprecating gcc-3.x once gcc-4.x hardened is in place, that sounds fine to me (In reply to comment #20) > since we're looking at deprecating gcc-3.x once gcc-4.x hardened is in place, > that sounds fine to me As far as I know qemu can't be built under gcc-4.x without patching. Patches which are not in place at gentoo. It would be a shame to lose the qemu package. what does that have to do with anything ? gcc-3.x is deprecated/not-supported now for non-hardened systems, it doesnt mean i'm removing them from the tree (In reply to comment #22) > it doesnt mean i'm removing them from the tree Ignore the last comment then. This hasn't been touched in years, is it OBSOLETE? Reopen if is still fail with gcc 4.X for we don't support gcc 3.x on hardened any longer. The bug has been referenced in the following commit(s): https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=97a888d8fd580498c116c643ad10b6fe937abdb9 commit 97a888d8fd580498c116c643ad10b6fe937abdb9 Author: Sam James <sam@gentoo.org> AuthorDate: 2025-01-28 07:14:36 +0000 Commit: Sam James <sam@gentoo.org> CommitDate: 2025-01-28 07:15:35 +0000 net-mail/tpop3d: drop filter-flags -fno-stack-protector This was a bug in some ancient GCC when SSP was still brand-new and was confirmed to work w/ newer GCC (4.1.2) in https://bugs.gentoo.org/115285#c19 in 2007. Bug: https://bugs.gentoo.org/115285 Signed-off-by: Sam James <sam@gentoo.org> net-mail/tpop3d/tpop3d-1.5.5-r5.ebuild | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) |