Sandbox failures bug 161045 and bug 131505 are down to Lisps providing their own malloc, which sandbox ends up calling when setting up env vars etc, but doesn't behave exactly like a standard malloc should. The solution is obvious if you're insane: we should provide our own malloc. The easiest way to do this is to take a malloc (e.g. ptmalloc3) that we can limit to mmap (no morecore/sbrk), link in to the sandbox sofile, and make sure that all allocation functions (strdup etc.) call our malloc. Crazy, but it works; I'll attach the POC patch (against 1.2.18.1), confirmed to work with both gcl and emacs-cvs.
Created attachment 108718 [details, diff] sandbox-use-own-malloc.patch Against 1.2.18.1; unpack http://www.malloc.de/malloc/ptmalloc3-current.tar.gz into src/.
erm, that's really a path to pain ... why not lookup malloc in the libc library like we do with the other functions and call that
Because there can only be one malloc using the heap. If you have more than one malloc calling sbrk() then one or both will get very confused.
you mean the lisp is using sbrk so that would cause us troubles when the glibc malloc attempts to use sbrk how exactly does the lisp malloc differ ? perhaps changing those would be an option i really really really want to avoid what you're proposing here simply because it can easily be a maintenance nightmare
(In reply to comment #4) > you mean the lisp is using sbrk so that would cause us troubles when the glibc > malloc attempts to use sbrk Yeah, the lisp malloc assumes that it owns the whole heap and is the only one calling sbrk. > how exactly does the lisp malloc differ ? perhaps changing those would be an > option It's like nothing I've seen anywhere else. It's heavily integrated with lisp gc (which happily relocates objects) and with the GCL serialization-through-memory-dump (si::save-system). I've tried tinkering with various bits but getting something that works and doesn't break assumptions elsewhere is beyond me. And, of course, it's completely undocumented. > i really really really want to avoid what you're proposing here simply because > it can easily be a maintenance nightmare Will it, though? Sandbox is constrained to using a minimal subset of the C library, and there really isn't that much that does memory allocation: - malloc/calloc/realloc/free: done or easily implementable - getcwd(NULL, ...): non-POSIX, avoidable - realpath(..., NULL): non-POSIX, avoidable - strdup: implemented in patch - tempnam: highly deprecated
you feel like porting and validating this custom malloc to every architecture/OS we support ? i dont what i meant by "how does it differ" is "why is it actually causing crashes" ? if all the lisp malloc is doing is allocating memory and giving us a pointer to it, why is it crashing ? glibc is designed to allow custom memory allocators take over the symbols at runtime
(In reply to comment #6) > you feel like porting and validating this custom malloc to every > architecture/OS we support ? i dont That's why I'm suggesting using ptmalloc; it's the standard malloc, everyone uses it. > what i meant by "how does it differ" is "why is it actually causing crashes" ? > if all the lisp malloc is doing is allocating memory and giving us a pointer to > it, why is it crashing ? glibc is designed to allow custom memory allocators > take over the symbols at runtime It's giving us a pointer to memory past the top of the heap. The reason this happens: the backtrace has #4 0x08088ac2 in unexec ( new_name=0x9001000 <Address 0x9001000 out of bounds>, old_name=0x1003 <Address 0x1003 out of bounds>, data_start=1073902592, bss_start=0, entry_address=0) at unexelf.c:672 #5 0x08089d3e in Lsave () at save.c:12 #6 0x0805293d in siLsave_system () at main.c:977 In siLsave_system() GCL gcs and compacts (relocating) the heap down, then uses brk() to shrink it to the actually used region. It then calls Lsave() which is supposed to dump the data segment into a file, then grows the heap back up to give room to allocate from. The problem is that sandbox calls malloc repeatedly inside Lsave(); as a result GCL's malloc gets very confused (nothing is supposed to be calling malloc at this point, because it's saving the data segment) and returns a pointer past the (temporarily lowered) top of the heap. I don't see how this is fixable, bar severely modifying gcl; gcl's malloc is not designed to be called within Lsave().
your patch there doesnt actually include any source code ... however, a trivial implementation of malloc() would be to simply call anonymous mmap()'s ... and then free() would simply be munmap() ... that would be completely portable
Created attachment 113681 [details, diff] sandbox-use-own-malloc.patch (In reply to comment #8) > your patch there doesnt actually include any source code ... It uses ptmalloc3 from the tarball (i.e. just add ptmalloc3-current.tar.gz to SRC_URI, apply the patch, run eautoreconf). It finds ptmalloc via ptmalloc_srcdir=$(top_srcdir)/../ptmalloc3. I forgot to put up the more recent version of the patch, sorry. The ebuild is in my tree, of course.
(In reply to comment #8) > however, a trivial implementation of malloc() would be to simply call anonymous > mmap()'s ... and then free() would simply be munmap() ... that would be > completely portable Well, that is effectively what we have compiling ptmalloc to use mmap. Do you mean providing our own implementations of the malloc/calloc/realloc/free symbols for resolution within libsandbox, or replacing calls with mmap/munmap? I'm prepared to implement either if it'd get accepted, but I'd prefer to know what you think would work.
i really dont know what "ptmalloc" is since your patch didnt include the ebuild patch assuming you mean http://www.malloc.de/, that's way over the top for what i would want to include replacing with mmap/munmap is quite trivial: void *malloc(size_t size) { size_t *ret; size += sizeof(size_t); ret = mmap(0, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); if (ret == MAP_FAILED) return NULL; *ret = size; return ret+1; } void free(void *ptr) { size_t *mmap_start, size; if (ptr == NULL) return; mmap_start = ((size_t*)ptr) - 1; size = *mmap_start; munmap(mmap_start, size); }
Created attachment 113720 [details] malloc.c
due to the obvious performance degradation versus the glibc malloc, i'm not sure we want to have this enabled for everyone all the time ... perhaps in sandbox startup we would do: - grab malloc symbols from libc.so.6 - grab malloc symbols from normal resolving - if they are different, have sandbox use internal malloc symbols or we could just have the behavior depend on an env var that broken packages would need to export: SANDBOX_LOCAL_MALLOC=1
actually, that has a subtle buffer overflow when using realloc() to shrink a buffer ... the memcpy() needs to read: memcpy(ret, ptr, (size < old_malloc_size ? size : old_malloc_size)); otherwise, if this solves your problems with the test case packages, i have no problem merging this version
Created attachment 122973 [details, diff] minimal-malloc.patch Sorry, been busy for a while. Here's a patch with your mmalloc.c, with strdup added and improved NULL handling, plus src/Makefile.am change. I've been using this for a while with no issues. As for performance degradation, I'm not so sure that there is much of an issue here. Obviously there will be some degradation on startup/init, but if we're doing malloc at syscall time then that's going to be slow anyway.
sure there's a performance difference ... this simple malloc has an exact 1-to-1 correlation between malloc() and having to do a system call the glibc malloc is able to satisfy smaller requests from its own managed heap without having to enter kernel space but unless someone does some actual profiling, i dont think it'll be a big deal for the majority of our users
Created attachment 132779 [details, diff] minimal-malloc.patch Updated, respects SANDBOX_LOCAL_MALLOC env flag and fixes a strdup bug.
you can just do 'return memcpy(...)' in strdup()
Created attachment 132845 [details, diff] minimal-malloc.patch Yup.
Created attachment 134085 [details, diff] minimal-malloc.patch Was having issues calling getenv from free at process exit. This checks it once only.
ive committed this stuff to svn ... but ive always enabled it. this is because the number of allocations actually done by libsandbox is relatively small, so the perf penality is negligible i think. it also simplifies the code and keeps us from having to worry about tracking two distinctive code paths and maintainers having to track down random crashes ... things "just work" for them
NB, this will not help for Emacs. The reason that we disable the sandbox there is a different one: Emacs dumps its executable, and we don't want any sandbox code included with its final binary. See bug 233 and bug 131505 for further details.
the pkg that prompted this was gcl, not emacs. i know emacs has a retarded build system and i really dont plan on trying to "fix" it.
> the pkg that prompted this was gcl, not emacs. Yes, but comment #0 also mentions emacs-cvs.