Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 235581 - sys-apps/sandbox is not thread-safe
Summary: sys-apps/sandbox is not thread-safe
Status: RESOLVED FIXED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: [OLD] Core system (show other bugs)
Hardware: All Linux
: High major (vote)
Assignee: Sandbox Maintainers
URL:
Whiteboard:
Keywords:
: 245226 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-08-23 23:18 UTC by Guenther Brunthaler
Modified: 2009-08-24 21:01 UTC (History)
4 users (show)

See Also:
Package list:
Runtime testing required: ---


Attachments
The source file which segfaults when run (badtest.c,3.47 KB, text/plain)
2008-08-23 23:22 UTC, Guenther Brunthaler
Details
The script which compiles and runs the badtest.c (runtest,470 bytes, text/plain)
2008-08-23 23:26 UTC, Guenther Brunthaler
Details
Totally stripped-down version of the test application (badtest.c,3.51 KB, text/plain)
2008-08-24 19:19 UTC, Guenther Brunthaler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guenther Brunthaler 2008-08-23 23:18:28 UTC
Bug # 234762 is a direct consequence of this bug. A ./configure test segfaults in a heavy multi-threading testing scenario, while the same test works fine when the configure test executable is run without the sandbox. The problem is reproducible at least on my quad-core CPU. It may or may not be reproducible on single-core machines.

Reproducible: Always

Steps to Reproduce:
1.Compile the attached source file using the attached shell script
2.Run the generated test executable from within the sandbox shell
3.

Actual Results:  
Test executable dies with a segfault

Expected Results:  
Should print a success message

The attached source file is a sub-portion of the original ./configure test application only.

I have removed all the other tests from the file and reduced it to the multithreading test which creates the segfault.

GDB showed that the problem arose in the unlink() function from the sandbox library substitute when called from the second testing thread, when the unlink function was calling a string comparison function.

In addition, glibc's memory debug routines report a duplicate or invalid free() of a memory block.

I think it might be some sort of race condition; perhaps some buffer pointer variables need protection by semaphores.
Comment 1 Guenther Brunthaler 2008-08-23 23:22:26 UTC
Created attachment 163684 [details]
The source file which segfaults when run

This file should be compiled and run using the "runtest" shell script.

When run from within the sandbox shell, it will crash.

When run from outside the sandbox shell, it will just execute fine.
Comment 2 Guenther Brunthaler 2008-08-23 23:26:13 UTC
Created attachment 163685 [details]
The script which compiles and runs the badtest.c

You might edit the file to change the compiler flags, especially "-march=".

However, this is as it worked for me in order to reproduce the test.

My processor is a quadcore AMD64 Phenom processor stepping B2 in 64 bit mode (with multilib enabled).

I am normally using the "usersandbox" FEATURE.
Comment 3 Guenther Brunthaler 2008-08-23 23:27:26 UTC
Here is my emerge --info output:

Portage 2.1.4.4 (default/linux/amd64/2008.0/desktop, gcc-4.1.2, glibc-2.6.1-r0, 2.6.25-gentoo-r7-xquad-8.226 x86_64)
=================================================================
System uname: 2.6.25-gentoo-r7-xquad-8.226 x86_64 AMD Phenom(tm) 9600 Quad-Core Processor
Timestamp of tree: Sat, 23 Aug 2008 12:45:01 +0000
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [disabled]
ccache version 2.4 [enabled]
app-shells/bash:     3.2_p33
dev-java/java-config: 1.3.7, 2.1.6
dev-lang/python:     2.5.2-r6
dev-python/pycrypto: 2.0.1-r6
dev-util/ccache:     2.4-r7
sys-apps/baselayout: 1.12.11.1
sys-apps/sandbox:    1.2.18.1-r2
sys-devel/autoconf:  2.13, 2.61-r2
sys-devel/automake:  1.4_p6, 1.5, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2, 1.10.1
sys-devel/binutils:  2.18-r3
sys-devel/gcc-config: 1.4.0-r4
sys-devel/libtool:   1.5.26
virtual/os-headers:  2.6.23-r3
ACCEPT_KEYWORDS="amd64"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=k8 -O2 -DNDEBUG -pipe -fno-stack-check"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/kde/3.5/env /usr/kde/3.5/share/config /usr/kde/3.5/shutdown /usr/local/etc /usr/share/X11/xkb /usr/share/config /var/lib/hsqldb"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/env.d/host-variants/ /etc/env.d/java/ /etc/fonts/fonts.conf /etc/gconf /etc/revdep-rebuild /etc/terminfo /etc/texmf/web2c /etc/udev/rules.d"
CXXFLAGS="-march=k8 -O2 -DNDEBUG -pipe -fno-stack-check"
DISTDIR="/usr/portage/distfiles"
EMERGE_DEFAULT_OPTS="--nospinner"
FEATURES="ccache distlocks metadata-transfer notitles parallel-fetch prelink sandbox sfperms strict unmerge-orphans userfetch userpriv usersandbox"
GENTOO_MIRRORS="/usr/portage/local/overlay/distfiles/local /usr/portage/local/overlay/distfiles /usr/portage/local/overlay/distfiles/precious /usr/portage/local/overlay/distfiles/mnt ftp://130.208.16.31/pub/gentoo/ http://140.127.177.17/pub/Linux/Gentoo ftp://140.127.177.17/pub/Linux/Gentoo http://gentoo.mirrors.easynews.com/linux/gentoo/ ftp://140.127.177.15/pub/Linux/Gentoo http://140.127.177.15/pub/Linux/Gentoo http://ftp.udc.es/gentoo/ http://mirrors.64hosting.com/pub/mirrors/gentoo/ http://gentoo.netnitco.net ftp://mirrors.64hosting.com/pub/mirrors/gentoo/"
LANG="de_AT.utf8"
LDFLAGS="-Wl,-O1"
LINGUAS="de"
MAKEOPTS="-j5"
PKGDIR="/usr/portage/packages"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --compress --force --whole-file --delete --stats --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages"
PORTAGE_TMPDIR="/var/tmp"
PORTDIR="/usr/portage"
PORTDIR_OVERLAY="/usr/portage/local/overlay"
SYNC="rsync://rsync.de.gentoo.org/gentoo-portage"
USE="3dnow 3dnowext X a52 aac aalib acpi alsa amd64 apache2 arts audiofile avahi bash-completion berkdb bluetooth branding bzip2 cairo caps cddb cdr cli cracklib crypt css cups curl custom-cflags dbus dri dts dv dvd dvdr dvdread ecc emboss encode evo exif expat fbcon ffmpeg fftw firefox flac foomaticdb fortran freetype ftp fuse gd gdbm gif gimp glut gmp gphoto2 gpm gstreamer gtk gtk2 hal iconv idea ieee1394 imagemagick imlib isdnlog jack java6 javascript jbig jikes jp2 jpeg jpeg2k kde kdeenablefinal kdehiddenvisibility kdexdeltas kipi lcms ldap libcaca libclamav libnotify logrotate lzo mad matroska midi mikmod mmx mmxext mng mp3 mpeg mudflap mule multilib musepack musicbrainz ncurses nls nptl nptlonly nsplugin oav odbc offensive ofx ogg openal opengl openmp pam pcre pdf perl pic png ppds pppd python qt qt3 qt3support qt4 quicktime readline reflection samba sasl screen sdl session sharedmem slang smartcard sndfile sox speex spell spl sqlite sse sse2 sse3 sse4a ssl startup-notification svg sysfs tcltk tetex theora threads threadsafe tiff tk truetype unicode usb userlocales utf8 vcd vorbis wxwindows x264 xft xml xorg xosd xpm xscreensaver xsl xv xvid xvmc zeroconf zlib" ALSA_CARDS="emu10k1" ALSA_PCM_PLUGINS="adpcm alaw asym copy dmix dshare dsnoop empty extplug file hooks iec958 ioplug ladspa lfloat linear meter mmap_emul mulaw multi null plug rate route share shm softvol" APACHE2_MODULES="actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" ELIBC="glibc" INPUT_DEVICES="evdev joystick keyboard mouse void" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LINGUAS="de" USERLAND="GNU" VIDEO_CARDS="apm dummy fbdev radeon v4l vesa vga"
Unset:  CPPFLAGS, CTARGET, FFLAGS, INSTALL_MASK, LC_ALL, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS
Comment 4 Guenther Brunthaler 2008-08-23 23:30:33 UTC
I forgot to mention that I had to remove ccache's bin-directory from $PATH before running "runtest" from within the sandbox shell. Otherwise the script might fail when trying to write to the ccache cache directory of the current user.
Comment 5 Guenther Brunthaler 2008-08-23 23:37:26 UTC
I have not fully analyzed under what exact circumstances the unlink() function sefaults.

It might be very well some bug in the testing application (it is from the PostgreSQL library package "libpq").

But even then, unlink() should not crash in the sandbox only in some situation when it does not also crash without a sandbox in the same situation.
Comment 6 Wormo (RETIRED) gentoo-dev 2008-08-24 00:50:39 UTC
I was able to trigger the segfault on my core2 duo by running attached test.
It definitely seems like a race. Can take many runs to trigger with a dual cpu; a couple hundred tries was not enough to see the problem, but running
  while ./badtest  ; do :  ; done
usually (but not always) crashes out within a few seconds.

(gdb) bt
#0  0x00002afcefd29550 in strcmp () from /lib/libc.so.6
#1  0x00002afcee8d2dc6 in ?? () from /usr/lib/libsandbox.so
#2  0x00002afcee8d3e3d in ?? () from /usr/lib/libsandbox.so
#3  0x00002afcee8d4731 in unlink () from /usr/lib/libsandbox.so
#4  0x0000000000400d45 in func_call_1 () at badtest.c:90
#5  0x00002afcefaa0047 in start_thread () from /lib/libpthread.so.0
#6  0x00002afcefd7628d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
Comment 7 Guenther Brunthaler 2008-08-24 11:27:40 UTC
(In reply to comment #6)

> (gdb) bt
> #0  0x00002afcefd29550 in strcmp () from /lib/libc.so.6
> #1  0x00002afcee8d2dc6 in ?? () from /usr/lib/libsandbox.so
> #2  0x00002afcee8d3e3d in ?? () from /usr/lib/libsandbox.so
> #3  0x00002afcee8d4731 in unlink () from /usr/lib/libsandbox.so

Yeah, that's the same one.

I also have an idea what might be the problem: As the unlink() function does not modify its argument and strcmp() also does not, I think it can be concluded that something bad happened to the argument of unlink() at the time strcmp() was called from within the sandbox' unlink() function.

As the testing program continuously allocates and deallocates the file name string, I assume the following has happened:

* One of the threads allocates a filename in the buffer and calls unlink with the buffer as the argument

* In meantime, while the other thread is still within unlink(), the other thread deallocates the buffer containing the filename and allocates a different buffer for the next filename.

* When unlink then tries to access the filename in the old buffer with strcmp(), this will access memory which is no longer allocated, leading to the segfault.

It would be interesting to what the glibc unlink() function does in order to avoid the same problems.

Because in my opinion, that problem is not easy to solve: Even if access to the whole unlink() function is serialized by using a semaphore, there exists still the possibility a different thread might deallocate the buffer while unlink() is working on it.

This problem is also not constrained to the unlink() function in particular, but should affect each sandbox function which accepts a pointer argument pointing to some date structure, and that sandbox function is expected to act "atomically".
Comment 8 Guenther Brunthaler 2008-08-24 11:34:28 UTC
Regarding how glibc functions might handle this: I think they need not handle that case at all.

Because after all, glibc is mostly a wrapper around the kernel syscalls which do the actual work.

I imagine unlink() from glibc just does an "INT 0x80" with the processor registers loaded from the string argument of unlink and the syscall-number for "unlink" in another register.

This triggers a soft interrupt, and the kernel will take over, copying the string argument into kernel address space before accessing it.

The kernel therefore works with a safe copy of the string contents, and it will not be a problem if the original buffer is deallocated in meantime.
Comment 9 Guenther Brunthaler 2008-08-24 11:47:53 UTC
But even in the glibc approach there is a small window for a race condition: If the string argument buffer is deallocated in the small time it takes for the kernel to copy the string contents to an in-kernel buffer, the string might also be corrupted during the copy.

On the other hand, this is actually an application problem: The application should ensure the argument passed to unlink() is "stable" at least as long as unlink() is running.

Problem is, not all applications seem to care about such guarantees.

So, what could be done in the sandbox the avoid the problems?

I would say, in order to match the behaviour of the emulated glibc functions as closely as possible, Sandbox just like the Kernel should make a copy of all structure instances passed by reference, may it be strings or other types of structures.

This will ensure there is only a very small window during which a different thread can invalidate or corrupt the pointed-to structure instances, and the Sandbox will have behave approximately identical to the glibc functions even in the event of race conditions.
Comment 10 Guenther Brunthaler 2008-08-24 12:24:31 UTC
I my thoughts from above, which I have not yet verified, turn out to be correct, then of course the PostgreSQL configure test I have submitted as a test case is the one containing the race condition, and not the Sandbox itself.

However, I assume problems like this will continue to appear in the future, because race conditions like that are unlikely to be detected on single-core processors, while the chances increase vastly on multicore CPUs.

This could lead to a situation where many packages can no longer be built using the Sandbox, but will compile fine without it, even though the actual faulty ones are the packages, and not the sandbox.

Nevertheless, building a package without the sandbox is probably not a very good idea for security reasons, so it might still be the best solution if the sandbox makes snapshot copies of all of its string arguments in order to mimick glibc's overall behavior as closely as possible even in the presence of race conditions.
Comment 11 Zdenek Behan 2008-08-24 19:08:00 UTC
I'm not sure how relevant this will be, but i get the same error with "sandbox-1.2.18.1-r3", and I am _NOT_ running an amd64 system. I have Core2quad Q6600, and 32bit userspace.

My problem was also with libpq on the same test, and either USE="-threads" or FEATURES="-usersandbox -sandbox" works around the problem.
Comment 12 Guenther Brunthaler 2008-08-24 19:18:07 UTC
Update: I have stripped down the testing program source text even further.

I can now tell that it seems *not* to be the fault of the test program - the string arguments of unlink() are never reallocated and remain constant throughout the remaining execution time of the application.
Comment 13 Guenther Brunthaler 2008-08-24 19:19:36 UTC
Created attachment 163723 [details]
Totally stripped-down version of the test application
Comment 14 m3q 2008-10-10 13:52:53 UTC
I had problem emerging libpq-8.0.15 (bug 234762 associated with this one - thread safety check failure when using sandbox) on my machine (Athlon X2, amd64, unsafe cflags) with sandbox-1.2.18.1-r2, after unmasking (keyword ~*) and installing sandbox-1.2.18.1-r3 it compiled fine.
Comment 15 SpanKY gentoo-dev 2008-11-09 12:30:44 UTC
i think this is fixed in svn sandbox trunk already as your test case runs just fine with it
Comment 16 Maurice van der Pot (RETIRED) gentoo-dev 2008-11-09 12:44:19 UTC
*** Bug 245226 has been marked as a duplicate of this bug. ***
Comment 17 SpanKY gentoo-dev 2008-11-16 12:32:13 UTC
fixed in svn trunk and will be in next version

to workaround the issue in your packages, just force the autoconf test result
Comment 18 Giampaolo Tomassoni 2009-08-24 16:14:04 UTC
I'm running a "stable" 2x Intel Core2 CPUs (4 totals, installed through the amd64 gentoo iso), which shows the very same behavious.

I *guess* this is because a stable amd64 system nowadays hosts sys-apps/sandbox-1.6-r2 and the fix to this is missing.

Which is the sys-apps/sandbox version hosting your patch, so that I can unmask and install it?
Comment 19 SpanKY gentoo-dev 2009-08-24 21:01:22 UTC
it isnt missing in 1.6-r2, it's disabled