233 – Emacs segfaults when merged through the sandbox.

Bug 233 - Emacs segfaults when merged through the sandbox.

Summary: Emacs segfaults when merged through the sandbox.

Status:	RESOLVED TEST-REQUEST

Alias:	None

Product:	Gentoo Linux
Classification:	Unclassified
Component:	Current packages (show other bugs)
Hardware:	x86 Linux

Importance:	High critical
Assignee:	Robert Coie (RETIRED)

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2002-01-19 06:46 UTC by Mikael Hallendal (hallski) (RETIRED)
Modified:	2005-07-13 06:02 UTC (History)
CC List:	9 users (show)

See Also:
Package list:
Runtime testing required:	---

Attachments
libsandbox.c.diff (libsandbox.diff,13.82 KB, patch) 2002-12-09 13:38 UTC, J Robert Ray	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mikael Hallendal (hallski) (RETIRED) gentoo-dev

2002-01-19 06:46:32 UTC

Emacs segfaults on startup when merged through the sandbox. I've just spent
about 4 hours today and 2 hours yesterday trying to figure out why emacs
segfaults when started. I've rebuilt gcc/glibc/binutils/.../, but then I
remembered a problem gbevin was having with xemacs when merged through the sandbox.

So, disabling the sandbox and remerging emacs and voila!

Comment 1 Geert Bevin 2002-01-19 09:58:44 UTC

Sadly enough I've been aware of this problem, but then with xemacs. Azarah and I
have been trying to work around it bad I can seem to find a solution, nor why it
happens. I wild guess is that something might be going wrong during the
compilation of the lisp files. Could you maybe try to selectively disable/enable
the sandbox during the compilation to pinpoint the exact cause of failure. I
have isolated the code that creates this segfaulting emacs in a seperate file.
Which I'll attach. Anyone, feel free to take a look at it.

Comment 2 Geert Bevin 2002-01-19 10:01:20 UTC

#define _GNU_SOURCE
#define _REENTRANT

#define open xxx_open
#  include <dlfcn.h>
#  include <errno.h>
#  include <fcntl.h>
#  include <stdlib.h>
#  include <sys/stat.h>
#  include <sys/types.h>
#undef open

extern int open(const char*, int, mode_t);
int (*orig_open)(const char*, int, mode_t) = NULL;
int open(const char* pathname, int flags, mode_t mode)
{
	int old_errno = errno;
	
	/* code that makes xemacs' compilation produce a segfaulting executable */
	char** test = NULL;
	test = (char**)malloc(sizeof(char*));
	free(test);
	/* end of that code */

	if (!orig_open)
	{
		orig_open = dlsym(RTLD_NEXT, "open");
	}
	errno = old_errno;
	return orig_open(pathname, flags, mode);
}

Comment 3 Mikael Hallendal (hallski) (RETIRED) gentoo-dev

2002-01-19 16:03:25 UTC

fixed this by adding DISABLE_SANDBOX in the ebuild.

Comment 4 Geert Bevin 2002-01-20 18:57:16 UTC

I don't think this bug should be put FIXED yet since we didn't find the exact
cause of the problem. Putting SANDBOX_DISABLED in the ebuild is just a patchy
workaround. The real issue should be found and tracked down.

Comment 5 Mikael Hallendal (hallski) (RETIRED) gentoo-dev

2002-01-20 19:12:49 UTC

ok

Comment 6 Mikael Hallendal (hallski) (RETIRED) gentoo-dev

2002-01-24 02:59:46 UTC

doesn't seem to be resolved. :(((
on cvs.gentoo.org it segfaults even when SANDBOX_DISABLED="1" is in the file.

Comment 7 Geert Bevin 2002-01-24 04:22:10 UTC

Could you try with sandbox not in the maintainer settings? Then it isn't even
called at all and nothing goes through it.

Comment 8 Mikael Hallendal (hallski) (RETIRED) gentoo-dev

2002-01-24 09:01:58 UTC

I will ask Kabau to do so.
A user claims to have this problem without MAINTAINER set so it might not be the
sandbox :(

At my machines all three worked when I disabled the sandbox, this is very strange!

Comment 9 Evan DiBiase 2002-01-26 00:49:38 UTC

I'll second the unnamed user's reports of segfaulting without MAINTAINER set. I'm currently getting my system up to speed again (wiped it yesterday before I knew about bugs.gentoo.org) and would be happy to help try and troubleshoot this issue.

Comment 10 Geert Bevin 2002-03-18 01:57:06 UTC

This has been happening to people that don't use the sandbox too. Also with the
latest versions I haven't had this problem anymore.

Comment 11 Michael Beattie 2002-08-03 18:29:33 UTC

I just emerge unmerged and then merged the latest version of emacs with the sandbox disable=1 in the ebuild and it still segfaults.

Comment 12 Noah Wild 2002-08-05 13:33:14 UTC

I also reemerged it as Michael did with the exact same result. No solution so
far for me

Comment 13 Noah Wild 2002-08-06 06:41:47 UTC

I updated portage from version 2.0.23 to 2.0.25. Now everything works fine
again. i can emerge emacs-21.2-r1 without any change to the config file

Comment 14 Ben Kennedy 2002-08-07 08:46:15 UTC

I had the same experience - stopped working, update to portage 2.0.25 (from 24 I
believe), started merging with sandbox again

Comment 15 Peter Davis 2002-08-14 16:07:18 UTC

With portage-2.0.27, I'm seeing this problem again for emacs-21.2-r1, but not for emacs-21.2.  The problem is that emerge apparantly thinks that 21.2-r1 is a newer version than just 21.2 (it isn't, right?).  So maybe this is a portage bug, but I see two solutions:  * Remove emacs-21.2-r1.ebuild (this is what I did, and it worked)  * Put something in the make profile to give the newer version preference  Maybe not everyone has problems with -r1, but shouldn't it be removed anyway?

Comment 16 J Robert Ray 2002-11-15 12:29:14 UTC

I may be chasing the wrong lead here but I figured I'd share this stack trace
from the crashing emacs:

#0  0x405a57ba in chunk_alloc (ar_ptr=0x40650e40, nb=32) at malloc.c:2904
#1  0x405a653b in chunk_realloc (ar_ptr=0x40650e40, oldp=0x8337f48, oldsize=12,
nb=32) at malloc.c:3514
#2  0x405a61db in __libc_realloc (oldmem=0x40650e40, bytes=1) at malloc.c:3388
#3  0x08159696 in emacs_blocked_realloc (ptr=0x8337f50, size=22) at alloc.c:801
#4  0x405a60fc in __libc_realloc (oldmem=0x40650e40, bytes=1) at malloc.c:3329
#5  0x081592ce in xrealloc (block=0x8337f50, size=22) at alloc.c:544
#6  0x0814c0b8 in regex_compile (pattern=0x825eab8 " +", size=2, syntax=3408388,
bufp=0x82da47c) at regex.c:2383
#7  0x08158205 in re_compile_pattern (pattern=0x825eab8 " +", length=2,
bufp=0x82da47c) at regex.c:5720
#8  0x08143b9c in compile_pattern_1 (cp=0x82da474, pattern=942009000,
translate=1211012816, regp=0x82d18a4, posix=0, multibyte=0) at search.c:166
#9  0x08143d8b in compile_pattern (pattern=942009000, regp=0x82d18a4,
translate=1211012816, posix=0, multibyte=0) at search.c:237
...


That malloc.c is inside glibc, alloc.c is in emacs.  It wasn't clear looking at
the code where the segfault comes from, I printed some variable values and
didn't find any null pointers or anything.  If the line number in malloc.c is to
be trusted, that's inside a cpp macro, doing some pointer deferencing, but none
of the pointers were null.


I also tried capturing a log of the entire emerge of emacs with sandbox on and
then with sandbox off, to compare.  Some gcc warnings moved around but the
output of each was exactly the same byte length, where wasn't any meaningful
difference.

A diff of the two emacs binaries show substancial differences, but reading a hex
dump of a binary doesn't teach me anything.

More investigation is needed.

Comment 17 J Robert Ray 2002-12-09 13:34:46 UTC

This bug has been driving me nuts!  I have been able to modify sandbox to avoid
the segfaulting emacs but I haven't been able to determine the exact cause.

My suspicion is it's a combination of how emacs does its own custom malloc
things and how it produces the final emacs binary with a coredump-like technique
of writing out the current in-memory process.

I can confirm comment #1 that the issue stems from calling malloc, I narrowed it
down to the open64 wrapper.  If you disable just that wrapper, this bug doesn't
happen.  If you replace it with a simple malloc/free combo (like in comment #2),
the bug happens.

I tried defining my own __malloc_hooks in libsandbox in the hopes that maybe the
problem was related to sandbox using the hooks emacs defines, but that wasn't
fruitful.

My workaround was to attempt to avoid calling malloc as much as possible inside
the syscall wrappers.  before_syscall() in libsandbox.c used to parse and
allocate strings for each path in a number of environment variables, do some
tests, then free all the strings.  This function is called by every syscall
wrapper, so it gets called a lot.  I changed it so that it only does the
parse/allocate if the environment variables' values have changed since last time
it was called.

There is something really confusing here.  If I replace before_syscall() with a
simple function that just calls malloc(), free(), and then returns, the emacs
gets created bad.  One would assume then that calling malloc/free here is at
fault.  However, with my changes in place, before_syscall() calls
check_syscall(), and in that function at least one call to malloc always occurs.
 But even with this malloc call, emacs builds fine.

Something really nasty is going on.

I'm not sure if I should commit my changes.  It makes emacs work, but doesn't
really fix the problem.  I did clean up a fair amount of code, so it may be
worth committing for that sake.

Comment 18 J Robert Ray 2002-12-09 13:38:21 UTC

Created attachment 6347 [details, diff]
libsandbox.c.diff

Here are my changes if you'd like to examine them.

Comment 19 Martin Schlemmer (RETIRED) gentoo-dev

2002-12-09 13:47:48 UTC

Ok, to be honest, I was too lazy to check the whole thing with a comb ...
shoot me :P   But a quick scroll though looks good :)  It being what it is,
I think it will be better if you commit it to sandbox-dev (it there for this
perpose) and *not* sandbox-1.1, and then we can give it a bit of testing first.

Also have a look at the execve wrapper (and if we need to add to other execve
calls, although i think most that modify env calls this one, but havent checked
that glibc code for some time), as that is pretty much all my code, and I do
not know how buffer overflow proof it is ...

Great work btw =)

Comment 20 J Robert Ray 2002-12-09 14:18:39 UTC

I don't need to know when the env var is changed (I don't want to have to trap
even more syscalls like setenv()), instead I save away the value of getenv() the
first time it is parsed, and on each syscall I strcmp() it with the current
value to determine if it needs to be parsed again.

This is why you mention execve right?

Comment 21 Martin Schlemmer (RETIRED) gentoo-dev

2002-12-09 14:25:04 UTC

Nope, just though that while you were busy with sandbox, you could just check
that it is sound (no possible segfaults/overflows/etc ) ...

Comment 22 Seemant Kulleen (RETIRED) gentoo-dev

2003-01-17 05:15:03 UTC

jared -- solve this

Comment 23 Jon Portnoy (RETIRED) gentoo-dev

2003-06-04 12:04:46 UTC

This bug is ancient. Is this still an issue?

Comment 24 Heinrich Wendel (RETIRED) gentoo-dev

2003-10-12 07:58:47 UTC

i don't think so...

Comment 25 Sandro Bonazzola (RETIRED) gentoo-dev

2005-07-13 05:37:38 UTC

Latest ebuilds don't have this problem. I've tested on emacs-21.4-r1.
This could be closed.

Comment 26 Martin Schlemmer (RETIRED) gentoo-dev

2005-07-13 06:02:03 UTC

Tried with broken out sandbox versions ?