marylinh / seccompsandbox Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/seccompsandbox
License: BSD 3-Clause "New" or "Revised" License
Automatically exported from code.google.com/p/seccompsandbox
License: BSD 3-Clause "New" or "Revised" License
seccomp-sandbox includes a Gyp file for building with the Gyp makefile generator (see http://code.google.com/p/gyp/). This is primarily for building seccomp-sandbox as part of Chromium, but it can also be used to build seccomp-sandbox standalone. To build seccomp-sandbox with Gyp you can do the following: 1) Check out Gyp from SVN: $ svn checkout http://gyp.googlecode.com/svn/trunk 2) Add the "gyp" executable (a Python script) to your PATH 3) Run Gyp: $ gyp seccomp.gyp --depth=. This generates "Makefile". Note that the non-Gyp, non-generated makefile is named "makefile" without an upper case "M" so that it does not get overwritten by Gyp. 4) Run GNU make to build the library: $ make -f Makefile ... AR(target) out/Default/obj.target/libseccomp_sandbox.a The resulting library is put in out/Default/obj.target. Note that the tests are not built by seccomp.gyp yet. To use a build directory that is separate from the source directory, you can do the following: mkdir -p build cd build gyp ../seccomp.gyp --depth=. make
I mentioned this problem on
http://codereview.chromium.org/2074003/show, but I am filing a bug so
that it doesn't get lost.
Currently some of seccomp-sandbox's tests fail on 32-bit systems.
On my netbook, running 32-bit Ubuntu Karmic, two tests failed with SIGSEGV:
test_sa_flags
test_segv_resethand
(NX page protection works on this system.)
On another machine, running 32-bit GHardy, just test_sa_flags failed,
again with SIGSEGV. (NX page protection doesn't work on this system.)
The tests are fine on the two 64-bit machines I tested on.
From sandbox.cc:
// Non-executable version of the restorer function. We use this to
// trigger a SEGV upon returning from the user's signal handler, giving
// us an ability to clean up prior to returning from the SEGV handler.
I don't think this will work on systems where no-execute page
protection doesn't work, i.e. older kernels and older hardware. This
restorer function will run and so the signal handler's counter won't
be decremented. You can verify this by linking the tests with
-Wl,-z,execstack (this option is badly named because it doesn't only
affect the stack).
This explains the test_sa_flags failure.
The test_segv_resethand failure seems a bit odder. The signal
handler's "ret" instruction jumps to non-executable code which causes
a SIGSEGV. But when I examined this with "strace -i" and gdb, the
"ret" is shown as the source of the fault, rather than the address
that "ret" jumps to (which is what the code expects). It looks like
the reported %eip varies between CPUs or kernel versions.
Original issue reported on code.google.com by [email protected]
on 27 Sep 2010 at 1:28
When building with GCC 4.6 on x64, the following build error appears for a
number of files:
In file included from seccompsandbox/syscall_table.h:18:0,
from seccompsandbox/sandbox_impl.h:51,
from seccompsandbox/debug.h:14,
from seccompsandbox/ioctl.cc:5:
seccompsandbox/securemem.h: In static member function ‘static void
playground::SecureMem::sendSystemCall(const
playground::SecureMem::SyscallRequestInfo&, playground::SecureMem::LockType,
T1, T2, T3) [with T1 = int, T2 = int, T3 = void*]’:
seccompsandbox/ioctl.cc:39:61: instantiated from here
seccompsandbox/securemem.h:180:5: error: cast to pointer from integer of
different size [-Werror=int-to-pointer-cast]
seccompsandbox/securemem.h:180:5: error: cast to pointer from integer of
different size [-Werror=int-to-pointer-cast]
This warning was added to GCC 4.6 as part of
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28584
Original issue reported on code.google.com by [email protected]
on 5 Jun 2011 at 8:20
When building with GCC 4.5, this error pops up:
sandbox.cc: In static member function ‘static void
playground::Sandbox::startSandbox()’:
sandbox.cc:447:24: error: ‘secureMem’ was not declared in this scope
In GCC 4.4 and earlier, and in the original C++ standard,
struct A { struct B {}; };
A::A::A::B::B::B::B::B myVariable;
is a perfectly valid declaration of a variable of type A::B.
In GCC 4.5, and in the current C++ standard, A::A is A's constructor, not
the type A. Normally, this won't be a problem, but in this case macro use
allowed it to creep in. Attached very trivial patch (based on the chromium
sources, but applicable without changes) modifies this so that it compiles.
This patch should be completely harmless for other compilers, but I can't
actually test for any ill effects, because patched or unpatched, with GCC
4.4.2 or GCC 4.5 (20091210 snapshot), I get a segmentation fault after an
error is reported on /proc/self/maps. I haven't ruled out the possibility
of a local problem, so don't consider that last part a bug report yet
unless you are already getting that yourself. :)
Original issue reported on code.google.com by [email protected]
on 15 Dec 2009 at 9:17
Attachments:
I discovered that seccomp-sandbox does not currently allow concurrent
sendmsg() and recvmsg() calls. If one thread is blocked in a
recvmsg() call, a second thread that calls sendmsg() will block.
This is because seccomp-sandbox uses a global mutex (syscall_mutex_)
for all syscalls that require data to be written to a secure memory
area by the trusted process. The trusted process will handle only one
syscall at a time, and it waits for syscall_mutex_ to be unlocked
before handling another syscall.
I discovered this while trying to hook up Native Client to use
seccomp-sandbox. Some of the tests deadlocked: there was a background
thread blocked on recvmsg(), while foreground threads would then block
on calls like mmap().
To fix this, I propose two changes:
1) Use one mutex per thread, rather than a global mutex.
2) Change the trusted process so that it does not wait for the
thread's mutex to be unlocked before processing another syscall
(which might come from another thread).
The wait in (2) happens in sendSystemCallInternal() in securemem.cc.
This wait should only be necessary if an allowed syscall has a side
effect that must complete for a subsequent allowed syscall to be safe.
I don't think this is the case for any currently allowed syscalls: the
trusted process does not attempt to model state changes of the
sandboxed process; ordering of syscalls, once checked, is not
significant. (A possible exception is in the IPC syscalls in ipc.cc.)
The only wait needed should be in lockSystemCall(), to prevent a
secure memory area from being reused while it is still in use.
I have got an implementation of these changes which I'll send out
soon.
Original issue reported on code.google.com by [email protected]
on 7 Sep 2010 at 3:57
This is a less specific version of issue 9.
We would like to be able to initialise the seccomp sandbox without
needing access to /proc/self/maps, so that we don't have to open a
hole in the SUID sandbox to get access to /proc.
Original issue reported on code.google.com by [email protected]
on 18 Oct 2010 at 1:27
clang's integrated assembler emits the following error when building the
seccomp sandbox:
/tmp/cc-DNyGz3.s:155:9: error: ambiguous instructions require an explicit
suffix (could be 'cmpb', 'cmpw', 'cmpl', or 'cmpq')
cmp $0, 0(%rax)
^
/tmp/cc-DNyGz3.s:157:9: error: ambiguous instructions require an explicit
suffix (could be 'cmpb', 'cmpw', 'cmpl', or 'cmpq')
cmp $1, 0(%rax)
This patch fixes the problem:
Index: fault_handler_i386.S
===================================================================
--- fault_handler_i386.S (revision 153)
+++ fault_handler_i386.S (working copy)
@@ -178,9 +178,9 @@
// callers might be confused by this and will need fixing for running
// inside of the seccomp sandbox.
20:lea playground$sa_segv, %eax
- cmp $0, 0(%eax) // SIG_DFL
+ cmpw $0, 0(%eax) // SIG_DFL
jz 21f
- cmp $1, 0(%eax) // SIG_IGN
+ cmpw $1, 0(%eax) // SIG_IGN
jnz 22f // can't really ignore synchronous signals
Original issue reported on code.google.com by [email protected]
on 26 Jan 2011 at 4:03
In Ubuntu Lucid, libpthread contains calls to the x86-64 vsyscall page:
$ objdump -d /lib/libpthread.so.0
...
ae50: 48 c7 c0 00 00 60 ff mov $0xffffffffff600000,%rax
ae57: ff d0 callq *%rax
...
(This is a call to vgettimeofday.)
When I disassemble these functions in a sandboxed process using gdb, I can see
that the SYSCALL instructions have been patched, but the indirect calls to the
vsyscall page have not. There is code in library.cc for patching indirect
calls, but it is only enabled for patching the vdso.
In practice, the vsyscall calls seem to be conditional on
__have_futex_clock_realtime being false. libpthread won't call vgettimeofday
on a kernel that supports FUTEX_CLOCK_REALTIME.
This issue might be behind the problem with Linux 3.1 (issue chromium:104084),
but I need to investigate more.
This is a difficult problem to solve in general, because it's probably not
practical to enable library.cc's indirect-call patching code for libpthread.so
or libc.so. The kernel does not allow us to patch the vsyscall page (which is
in the kernel range of address space), unlike the vdso. However, the vsyscall
page is deprecated, so we probably don't need to handle the general case.
Original issue reported on code.google.com by [email protected]
on 15 Nov 2011 at 5:10
When the nascent thread is starting, it forks a subprocess which does
a sendmsg() call to the trusted process.
The reason for doing the sendmsg() in a forked subprocess is
presumably to stop the untrusted threads from tampering with
sendmsg()'s "struct msghdr" arguments, which are passed in memory and
not in registers.
However, the trusted thread uses the new thread's stack for the
"struct msghdr" (%ebp in trusted_thread_i386.S). This stack is mapped
by untrusted code, and it could have been mapped with MAP_SHARED, in
which case fork() will not create a private copy.
This means untrusted code could bypass the sandbox's restrictions on
sendmsg() by racing to overwrite this memory. e.g. It could fill out
a non-NULL msg_name value.
I haven't tried testing this though.
The fix would be to use any page that is guaranteed to be mapped with
MAP_PRIVATE.
Does this sound right, Markus?
Original issue reported on code.google.com by [email protected]
on 23 Sep 2010 at 9:50
trusted_process.cc writes to one page per kMaxThreads on startup.
Since kMaxThreads is currently 100, this uses 400k of memory.
It should be possible to change this so that we only touch these pages
as new threads are created.
Original issue reported on code.google.com by [email protected]
on 1 Oct 2010 at 12:05
In the Native Client source tree, the code is split into "trusted" and
"untrusted" directories, with an additional "shared" directory for
code that is used in both contexts.
It would be good to do something similar for the seccomp sandbox. It
would make the code easier to review.
Ideally, each of the files that handles specific syscalls (mmap.cc,
open.cc, exit.cc, etc.) would be split into two files, to separate the
sandbox_*() and process_*() functions.
When I was first getting familiar with the codebase, I found that
having sandbox_*() and process_*() in the same file made the codebase
harder to navigate by grepping, because it is not immediately obvious
whether a symbol is referred to from trusted or untrusted code.
Original issue reported on code.google.com by [email protected]
on 21 Oct 2010 at 10:02
seccompsandbox fails in a kernel with PaX because it restricts mprotect()
writing to executable sections. The executables need to be explicitely marked
not to enforce secure memory protections.
Original issue reported on code.google.com by [email protected]
on 14 May 2012 at 6:10
Attachments:
Currently syscallTable is filled out statically in syscall_table.c.
This has to be done in C to make it read-only because of a limitation
in g++.
An alternative would be to fill out the table at run time.
From http://codereview.chromium.org/3414016/show:
"syscall_table.c is only saving us 4k of memory vs. populating at
runtime, and only for non-PIC code. Building this into a PIE or a
library would lose the saving.
Populating the table at runtime would make it easier to define
policies or have alternate syscall handlers. e.g. NaCl requires
modify_ldt(), but it would be good to disable this for other
processes just in case. Plash would like to intercept open() to
operate purely via message passing."
Another advantage would be that the table can be filled out in C++.
The asm("playground$foo") tricks we use to mix C and C++ wouldn't be
needed any more.
Original issue reported on code.google.com by [email protected]
on 30 Oct 2010 at 1:38
Following on from http://codereview.chromium.org/3380018/show and
http://codereview.chromium.org/3414016/show, for the sake of
completeness, I am filing a bug on this.
There is a vulnerability in process_sigaction() in sigaction.cc, which
does the following:
SecureMem::sendSystemCall(threadFdPub, false, -1, mem, sigaction_req.sysnum,
sigaction_req.signum, sigaction_req.action,
sigaction_req.old_action,
sigaction_req.sigsetsize);
It receives the syscall number sigaction_req.sysnum in a message, but
it passes it on to the trusted thread for execution without checking it.
This means an attacker can execute any syscall with 4 arguments. The
only constraint is that the first argument cannot be 11.
Original issue reported on code.google.com by [email protected]
on 27 Sep 2010 at 1:41
test_debugging has started to fail for me on x86-64.
The cause seems to be Debug::enter()'s test for %gs. It checks whether %gs is
zero, and if so, Debug::enter() doesn't increment the recursion counter and it
returns true.
However, we would expect %gs to be zero on x86-64. See the test program below.
The result is that, in debugging mode, we get infinite recursion:
defaultSystemCallHandler() calls Debug::syscall(), which calls gettimeofday(),
which triggers a call to defaultSystemCallHandler(). Without the recursion
check, this calls gettimeofday() again.
On my Ubuntu Lucid VM, this didn't just run out of stack, it triggered the OOM
killer, and my window borders disappeared because the kernel killed Metacity
(!).
What I don't understand is why the test was passing before. I'm not sure what
has changed. Maybe syscall_entrypoint.cc's special case for gettimeofday() was
making this work. But if that is the case, I don't know why this has started
failing.
I am not sure if %fs/%gs should ever show up as having non-zero values on
x86-64. The test program below gives the following output:
%gs = 0
%gs:0 = 1234
%fs = 0
%fs:0 = 139925201401600
#include <stdio.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <asm/prctl.h>
int main() {
long tls = 1234;
long val;
syscall(__NR_arch_prctl, ARCH_SET_GS, &tls);
asm("mov %%gs, %0" : "=r" (val));
printf("%%gs = %li\n", val);
asm("mov %%gs:0, %0" : "=r" (val));
printf("%%gs:0 = %li\n", val);
asm("mov %%fs, %0" : "=r" (val));
printf("%%fs = %li\n", val);
asm("mov %%fs:0, %0" : "=r" (val));
printf("%%fs:0 = %li\n", val);
return 0;
}
Original issue reported on code.google.com by [email protected]
on 26 Sep 2010 at 12:24
clang complains "error: expression result unused [-Wunused-value]" in a couple
places while building the seccomp sandbox.
I've listed the places below. Instead of silencing the compiler, you probably
want to log an error. I don't know how logging works in the seccomp sandbox.
Index: mutex.h
===================================================================
--- mutex.h (revision 153)
+++ mutex.h (working copy)
@@ -124,7 +124,7 @@
#else
#error Unsupported target platform
#endif
- NOINTR_SYS(sys.futex(mutex, FUTEX_WAKE, 1, 0));
+ (void)NOINTR_SYS(sys.futex(mutex, FUTEX_WAKE, 1, 0));
return rc;
}
Index: sandbox.cc
===================================================================
--- sandbox.cc (revision 153)
+++ sandbox.cc (working copy)
@@ -244,8 +244,8 @@
status_ = STATUS_AVAILABLE;
}
int rc;
- NOINTR_SYS(sys.waitpid(pid, &rc, 0));
- NOINTR_SYS(sys.close(fds[0]));
+ (void)NOINTR_SYS(sys.waitpid(pid, &rc, 0));
+ (void)NOINTR_SYS(sys.close(fds[0]));
return status_ != STATUS_UNSUPPORTED;
}
}
@@ -349,7 +349,7 @@
// Take a snapshot of the current memory mappings. These mappings will be
// off-limits to all future mmap(), munmap(), mremap(), and mprotect() calls.
snapshotMemoryMappings(processFdPub_, proc_self_maps_);
- NOINTR_SYS(sys.close(proc_self_maps_));
+ (void)NOINTR_SYS(sys.close(proc_self_maps_));
proc_self_maps_ = -1;
// Creating the trusted thread enables sandboxing
Index: trusted_process.cc
===================================================================
--- trusted_process.cc (revision 153)
+++ trusted_process.cc (working copy)
@@ -118,8 +118,8 @@
nextThread = currentThread->mem->newSecureMem;
goto newThreadCreated;
} else if (header.sysnum == __NR_exit) {
- NOINTR_SYS(sys.close(iter->second.fdPub));
- NOINTR_SYS(sys.close(iter->second.fd));
+ (void)NOINTR_SYS(sys.close(iter->second.fdPub));
+ (void)NOINTR_SYS(sys.close(iter->second.fd));
SecureMem::Args* secureMem = currentThread->mem;
threads.erase(iter);
secureMemPool_.push_back(secureMem);
Original issue reported on code.google.com by [email protected]
on 26 Jan 2011 at 4:05
Currently the seccomp sandbox works as a library. After starting up, a process
can enable the sandbox. This means the sandbox is limited to trusted programs
that wish to run parts of themselves untrusted.
It would be good if the seccomp sandbox could be applied to existing programs.
To run an existing executable, we would have to enable sandboxing before the
executable's code is run. Furthermore, we don't want to have to modify glibc's
dynamic linker (ld.so), or trust it. So we would need to enable sandboxing
before the dynamic linker gets control too.
We would need to support whatever syscalls ld.so does on startup. One case of
this is ld.so's TLS initialisation. On i386, this uses set_thread_area(). On
x86-64, it uses arch_prctl()+ARCH_SET_FS.
There is a design sketch for this at http://plash.beasts.org/wiki/SeccompSandbox
Original issue reported on code.google.com by [email protected]
on 11 Nov 2010 at 4:05
Currently, in Chromium, enabling the seccomp sandbox is done entirely
after forking from the zygote process, and this includes patching
libraries. However, it would be good if patching libraries could be
done before fork(). This would have two advantages:
1) Performance: Patching libraries only once would save time and memory.
2) Security, when using the SUID sandbox: Currently the zygote
process needs to keep a directory FD for /proc, because the
seccomp sandbox needs /proc/self/maps in order to do library
patching.
/proc conveys a lot of authority, so this makes the SUID sandbox
less secure than it would otherwise be, even if this FD is only
held by the zygote process and not its children.
If the zygote process had Breakpad enabled (although it's not
supposed to), a SUID-sandboxed process could take control of the
zygote (and hence its /proc FD) by sending it a signal, waiting
for the zygote to make itelf dumpable using prctl(), and then
taking control of the zygote using ptrace().
In order to allow patching before fork(), we would need to add a
global flag to the syscall interceptor to pass through syscalls
unaltered until the sandbox has been enabled fully.
Original issue reported on code.google.com by [email protected]
on 18 Oct 2010 at 1:09
What steps will reproduce the problem?
make -f makefile
What do you see instead?
library.h:159:46: error: 'ssize_t' has not been declared
Solved by including sys/types.h, which per IEEE Std 1003.1-2001, shall define
ssize_t
Original issue reported on code.google.com by [email protected]
on 14 May 2012 at 5:47
Attachments:
The seccomp sandbox allocates per-thread data structures on startup,
so the maximum number of threads is fixed at startup. This is set via
kMaxThreads, which is currently set to 100.
This is a bit low for Native Client. Currently we support 8180
threads on Linux, 2556 on Mac OS X, and at least 900 on Windows (see
tests/egyptian_cotton/nacl.scons). The maximum number of threads is
visible to untrusted code. While we don't guarantee any number, 100
is a bit low compared with what we support currently.
See also issue 7.
Original issue reported on code.google.com by [email protected]
on 1 Oct 2010 at 12:06
What steps will reproduce the problem?
make test
What do you see instead?
tests/test_patching.cc: In function 'void patch_range(char*, char*)':
tests/test_patching.cc:19:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:25:66: error: 'getpagesize' was not declared in this
scope
tests/test_patching.cc:26:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:29:3: error: 'close' was not declared in this scope
tests/test_patching.cc:29:3: error: invalid type in declaration before '=' token
tests/test_patching.cc:29:3: error: '_exit' was not declared in this scope
tests/test_patching.cc: In function 'void test_patching_syscall()':
tests/test_patching.cc:33:20: error: 'getpid' was not declared in this scope
tests/test_patching.cc:34:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:39:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:40:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:41:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:42:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:52:3: error: '_exit' was not declared in this scope
tests/test_patching.cc: In function 'void check_patching_vsyscall(char*,
char*)':
tests/test_patching.cc:76:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:77:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:78:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:79:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:80:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:81:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:82:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:83:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:84:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:85:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:86:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:87:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:88:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:89:3: error: '_exit' was not declared in this scope
tests/test_patching.cc: In function 'void
test_patching_vsyscall_gettimeofday()':
tests/test_patching.cc:95:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:96:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:97:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:102:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:103:3: error: '_exit' was not declared in this scope
tests/test_patching.cc: In function 'void test_patching_vsyscall_time()':
tests/test_patching.cc:109:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:111:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:117:3: error: '_exit' was not declared in this scope
tests/test_patching.cc: In function 'void test_patching_vsyscall_getcpu()':
tests/test_patching.cc:121:3: error: '_exit' was not declared in this scope
tests/test_patching.cc:129:3: error: '_exit' was not declared in this scope
make: *** [tests/test_patching.o64] Error 1
Solved by including unistd.h, which is the proper header for declaring _exit(),
close(), getpid() and getpagesize()
All but getpagesize() are defined by POSIX to have its prototype in unistd.h
getpagesize() appears in SVr4, 4.4BSD, SUSv2 and is also declared in unistd.h
Original issue reported on code.google.com by [email protected]
on 14 May 2012 at 5:55
Attachments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.