GithubHelp home page GithubHelp logo

rust-vmm / seccompiler Goto Github PK

View Code? Open in Web Editor NEW
69.0 18.0 10.0 218 KB

Provides easy-to-use Linux seccomp-bpf jailing.

Home Page: https://crates.io/crates/seccompiler

License: Apache License 2.0

Rust 94.69% Shell 5.31%
seccomp-bpf seccomp

seccompiler's Introduction

Seccompiler

Provides easy-to-use Linux seccomp-bpf jailing.

Seccomp is a Linux kernel security feature which enables a tight control over what kernel-level mechanisms a process has access to. This is typically used to reduce the attack surface and exposed resources when running untrusted code. This works by allowing users to write and set a BPF (Berkeley Packet Filter) program for each process or thread, that intercepts syscalls and decides whether the syscall is safe to execute.

Writing BPF programs by hand is difficult and error-prone. This crate provides high-level wrappers for working with system call filtering.

Supported platforms

Due to the fact that seccomp is a Linux-specific feature, this crate is supported only on Linux systems.

Supported host architectures:

  • Little-endian x86_64
  • Little-endian aarch64

Short seccomp tutorial

Linux supports seccomp filters as BPF programs, that are interpreted by the kernel before each system call.

They are installed in the kernel using prctl(PR_SET_SECCOMP) or seccomp(SECCOMP_SET_MODE_FILTER).

As input, the BPF program receives a C struct of the following type:

struct seccomp_data {
    int nr; // syscall number.
    __u32 arch; // arch-specific value for validation purposes.
    __u64 instruction_pointer; // as the name suggests..
    __u64 args[6]; // syscall arguments.
};

In response, a filter returns an action, that can be either one of:

#define SECCOMP_RET_KILL_PROCESS 0x80000000U /* kill the process */
#define SECCOMP_RET_KILL_THREAD  0x00000000U /* kill the thread */
#define SECCOMP_RET_TRAP         0x00030000U /* disallow and force a SIGSYS */
#define SECCOMP_RET_ERRNO        0x00050000U /* returns an errno */
#define SECCOMP_RET_TRACE        0x7ff00000U /* pass to a tracer or disallow */
#define SECCOMP_RET_LOG          0x7ffc0000U /* allow after logging */
#define SECCOMP_RET_ALLOW        0x7fff0000U /* allow */

Design

The core concept of the library is the filter. It is an abstraction that models a collection of syscall-mapped rules, coupled with on-match and default actions, that logically describes a policy for dispatching actions (e.g. Allow, Trap, Errno) for incoming system calls.

Seccompiler provides constructs for defining filters, compiling them into loadable BPF programs and installing them in the kernel.

Filters are defined either with a JSON file or using Rust code, with library-defined structures. Both representations are semantically equivalent and model the rules of the filter. Choosing one or the other depends on the use case and preference.

The core of the package is the module responsible for the BPF compilation. It compiles seccomp filters expressed as Rust code, into BPF filters, ready to be loaded into the kernel. This is the seccompiler backend.

The process of translating JSON filters into BPF goes through an extra step of deserialization and validation (the JSON frontend), before reaching the same backend for BPF codegen.

The Rust representation is therefore also an Intermediate Representation (IR) of the JSON filter. This modular implementation allows for extendability in regards to file formats. All that is needed is a compatible frontend.

The diagram below illustrates the steps required for the JSON and Rust filters to be compiled into BPF. The blue boxes represent potential user input.

seccompiler architecture

Filter definition

Let us take a closer look at what a filter is composed of, and how it is defined:

The smallest unit of the filter is the SeccompCondition, which is a comparison operation applied to the current system call. It’s parametrised by the argument index, the length of the argument, the operator and the actual expected value.

Going one step further, a SeccompRule is a vector of SeccompConditions, that must all match for the rule to be considered matched. In other words, a rule is a collection of and-bound conditions for a system call.

Finally, at the top level, there’s the SeccompFilter. The filter can be viewed as a collection of syscall-associated rules, with a predefined on-match action and a default action that is returned if none of the rules match.

In a filter, each system call number maps to a vector of or-bound rules. In order for the filter to match, it is enough that one rule associated to the system call matches. A system call may also map to an empty rule vector, which means that the system call will match, regardless of the actual arguments.

The following diagram models a simple filter, that only allows accept4, fcntl(any, F_SETFD, FD_CLOEXEC, ..) and fcntl(any, F_GETFD, ...). For any other system calls, the process will be killed.

filter diagram

As specified earlier, there are two ways of expressing the filters:

  1. JSON (documented in json_format.md);
  2. Rust code (documented by the library).

See below examples of both representation methods, for a filter equivalent to the diagram above:

Example JSON filter

{
    "main_thread": {
        "mismatch_action": "kill_process",
        "match_action": "allow",
        "filter": [
            {
                "syscall": "accept4"
            },
            {
                "syscall": "fcntl",
                "args": [
                    {
                        "index": 1,
                        "type": "dword",
                        "op": "eq",
                        "val": 2,
                        "comment": "F_SETFD"
                    },
                    {
                        "index": 2,
                        "type": "dword",
                        "op": "eq",
                        "val": 1,
                        "comment": "FD_CLOEXEC"
                    }
                ]
            },
            {
                "syscall": "fcntl",
                "args": [
                    {
                        "index": 1,
                        "type": "dword",
                        "op": "eq",
                        "val": 1,
                        "comment": "F_GETFD"
                    }
                ]
            }
        ]
    }
}

Note that JSON files need to specify a name for each filter. While in the example above there is only one (main_thread), other programs may be using multiple filters.

Example Rust-based filter

SeccompFilter::new(
    // rule set - BTreeMap<i64, Vec<SeccompRule>>
    vec![
        (libc::SYS_accept4, vec![]),
        (
            libc::SYS_fcntl,
            vec![
                SeccompRule::new(vec![
                    Cond::new(1,
                        SeccompCmpArgLen::Dword,
                        SeccompCmpOp::Eq,
                        libc::F_SETFD as u64
                    )?,
                    Cond::new(
                        2,
                        SeccompCmpArgLen::Dword,
                        SeccompCmpOp::Eq,
                        libc::FD_CLOEXEC as u64,
                    )?,
                ])?,
                SeccompRule::new(vec![
                    Cond::new(
                        1,
                        SeccompCmpArgLen::Dword,
                        SeccompCmpOp::Eq,
                        libc::F_GETFD as u64,
                    )?
                ])?
            ]
        )
    ].into_iter().collect(),
    // mismatch_action
    SeccompAction::KillProcess,
    // match_action
    SeccompAction::Allow,
    // target architecture of filter
    TargetArch::x86_64,
)?

Example usage

Using seccompiler in an application is a two-step process:

  1. Compiling filters (into BPF)
  2. Installing filters

Compiling filters

A user application can compile the seccomp filters into loadable BPF either at runtime or at build time.

At runtime, the process is straightforward, leveraging the seccompiler library functions on hardcoded/file-based filters.

At build-time, an application can use a cargo build script that adds seccompiler as a build-dependency and outputs at a predefined location (e.g. using env::var("OUT_DIR")) the compiled filters, that have been serialized to a binary format (e.g. bincode). They can then be ingested by the application using include_bytes! and deserialized before getting installed. This build-time option can be used to shave off the filter compilation time from the app startup time, if using a low-overhead binary format.

Regardless of the compilation moment, the process is the same:

For JSON filters, the compilation to loadable BPF is performed using the compile_from_json() function:

let filters: BpfMap = seccompiler::compile_from_json(
    File::open("/path/to/json")?, // Accepts generic Read objects.
    seccompiler::TargetArch::x86_64,
)?;

BpfMap is another type exposed by the library, which maps thread categories to BPF programs.

pub type BpfMap = HashMap<String, BpfProgram>;

Note that, in order to use the JSON functionality, you need to add the json feature when importing the library.

For Rust filters, it’s enough to perform a try_into() cast, from a SeccompFilter to a BpfProgram:

let seccomp_filter = SeccompFilter::new(
    rules,
    SeccompAction::Trap,
    SeccompAction::Allow,
    seccompiler::TargetArch::x86_64
)?;

let bpf_prog: BpfProgram = seccomp_filter.try_into()?;

Installing filters

let bpf_prog: BpfProgram; // Assuming it was initialized with a valid filter.

seccompiler::apply_filter(&bpf_prog)?;

It’s interesting to note that installing the filter does not take ownership or invalidate the BPF program, thanks to the kernel which performs a copy_from_user on the program before installing it.

Feature documentation

The documentation on docs.rs does not include the feature-gated json functionality.

In order to view the documentation including the optional json feature, you may run: cargo doc --open --all-features

Seccomp best practices

  • Before installing a filter, make sure that the current kernel version supports the actions of the filter. This can be checked by inspecting the output of: cat /proc/sys/kernel/seccomp/actions_avail or by calling the seccomp(SECCOMP_GET_ACTION_AVAIL) syscall.

  • The recommendation is to use an allow-list approach for the seccomp filter, only allowing the bare minimum set of syscalls required for your application. This is safer and more robust than a deny-list, which would need updating whenever a new, dangerous system call is added to the kernel.

  • When determining the set of system calls needed by an application, it is recommended to exhaustively run all the code paths, while tracing with strace or perf. It is also important to note that applications rarely use the system call interface directly. They usually use libc wrappers which, depending on the implementation, use different system calls for the same functionality (e.g. open vs openat).

  • Linux supports installing multiple seccomp filters on a thread/process. They are all evaluated in-order and the most restrictive action is chosen. Unless your application needs to install multiple filters on a thread, it is recommended to deny the prctl and seccomp system calls, to avoid having malicious actors further restrict the installed filters.

  • The Linux vDSO usually causes some system calls to run entirely in userspace, bypassing the seccomp filters (for example clock_gettime). This can lead to failures when running on machines that don't support the same vDSO system calls, if the said syscalls are used but not allowed. It is recommended to also test the seccomp filters on a machine that doesn't have vDSO, if possible.

  • For minimising system call overhead, it is recommended to enable the BPF Just in Time (JIT) compiler. After the BPF program is loaded, the kernel will translate the BPF code into native CPU instructions, for maximum efficieny. It can be configured via: /proc/sys/net/core/bpf_jit_enable.

seccompiler's People

Contributors

aghecenco avatar alindima avatar andreeaflorescu avatar boustrophedon avatar dependabot[bot] avatar hawk777 avatar ramyak-mehra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seccompiler's Issues

Implement From<BackendError> for Error

Without this we need to add boiler plate code when creating custom filters:

....
    SeccompFilter::new(
        rules.into_iter().collect(),
        SeccompAction::Trap,
        SeccompAction::Allow,
        ARCH.try_into().unwrap(),
    )
    .map_err(Error::Backend)?
    .try_into()
    .map_err(Error::Backend)

If we implement From, then this could be simplified by just having:

    SeccompFilter::new(
        rules.into_iter().collect(),
        SeccompAction::Trap,
        SeccompAction::Allow,
        ARCH.try_into().unwrap(),
    )?
    .try_into()

Add condition operator to accepts list of values

Currently it is not possible to only allow certain values in a filter that is permissive. If we had x in [values] and x not_in [values] operators, it would be possible to express such conditions. Currently we have to list all values that we want to deny. Example of the proposed:

"enable_only_inet": {
  "mismatch_action": "allow",
  "match_action": { "errno": 1},
  "filter": [
     {
       "syscall": "socket",
       "args": [
          {
            "index": 0,
            "type": "dword",
            "op", "not_in"
            "val": [2, 10],
            "comment": "deny all except AF_INET or AF_INET6"
          }
        ]
     }
   ]
}

Requirement for “signed” commits in PR template is confusing

The PR template says:

All commits in this PR are signed (with git commit -s)

As someone who doesn’t tend to remember which one is -s and which one is -S, I found this wording confusing: IMO the word “signed”, on its own, means cryptographically signed, i.e. -S; I think this is probably the more common understanding. IMO something like “signed off” would be sufficient to eliminate the confusion, though there might be other rewordings that would be even more clear.

Using seccomp system-wide

Hi.
I have a question about seccomp. can we use seccomp as system-wide and trace all processes on system?
As far as I know, it can only be used with forking main process and exec certain process to trace or be restricted. how can we use this for all processes?
thank you.

[Request] Allow filtering 32 and 64 bits syscalls for x86-64.

At the moment, it's not possible to filter both. If a filtered program call a 32 bits program, it will result in a bad system call. In libseccomp, one can differentiate between the 2 by checking for __X32_SYSCALL_BIT mask on the system call number. It would be very useful for my use case, filtering calls from a sandbox environment that may use 32 bits applications.

Documentation/clarification on SeccompCmpArgLen

Is it basically always okay to use SeccompCmpArgLen::Qword on 64 bit systems? I'm doing so in my tests and I don't seem to see any issues, but I also don't have anything close to exhaustive tests of all syscalls.

Libseccomp seems to always do the full 64 bit comparison on 64 bit systems: https://github.com/seccomp/libseccomp/blob/main/src/db.c#L1665

I don't have any plans currently to support 32 bit systems but I'm just not sure when I might encounter issues. Does it exist basically so that the backend can generate either 32 or 64 bit ebpf regardless of what the host architecture is? i.e. When actually running the ebpf, you should always use Qword on 64 bit systems and Dword on 32 bit systems?

Unable to cross compile for OpenWrt target mips-unknown-linux-musl

related to: SubconsciousCompute/seccomp-pledge#5

dora@openwrtbuildpc:~/coderepo/openwrt/seccomp-pledge$ cargo build --release --target mips-unknown-linux-musl
   Compiling serde v1.0.152
   Compiling libc v0.2.139
   Compiling serde_json v1.0.91
   Compiling itoa v1.0.5
   Compiling ryu v1.0.12
   Compiling optional-fields-serde-macro v0.1.1
   Compiling optional-field v0.1.3
   Compiling seccompiler v0.3.0
   Compiling seccomp-pledge v0.1.0 (/home/dora/coderepo/openwrt/seccomp-pledge)
error[E0432]: unresolved import `seccompiler::BpfMap`
 --> src/main.rs:2:5
  |
2 | use seccompiler::BpfMap;
  |     ^^^^^^^^^^^^^^^^^^^ no `BpfMap` in the root

error[E0433]: failed to resolve: could not find `TargetArch` in `seccompiler`
   --> src/main.rs:411:22
    |
411 |         seccompiler::TargetArch::x86_64,
    |                      ^^^^^^^^^^ could not find `TargetArch` in `seccompiler`

error[E0425]: cannot find function `compile_from_json` in crate `seccompiler`
   --> src/main.rs:409:66
    |
409 | ...compiler::compile_from_json(
    |              ^^^^^^^^^^^^^^^^^ not found in `seccompiler`

error[E0425]: cannot find function `apply_filter` in crate `seccompiler`
   --> src/main.rs:428:21
    |
428 |     if seccompiler::apply_filter(filter).is_err() {
    |                     ^^^^^^^^^^^^ not found in `seccompiler`

Some errors have detailed explanations: E0425, E0432, E0433.
For more information about an error, try `rustc --explain E0425`.
error: could not compile `seccomp-pledge` due to 4 previous errors
dora@openwrtbuildpc:~/coderepo/openwrt/seccomp-pledge$

I think the issue is in the linker during cross compiling, seccompiler doesn’t define a linker target for mips https://github.com/rust-vmm/seccompiler/blob/main/.cargo/config, I found this that can help though rust-lang/rust#37507 (comment)

Consider releasing 0.3.1 or 0.4?

Hi, I was wondering if you'd consider publishing a new version so that the apply_to_all_threads function is available for use in my own crate.

[Request] Add example for `SECCOMP_GET_ACTION_AVAIL`

ISSUE

Overview

Hello, I'm writing concerning the following quote from the docs:

Before installing a filter, make sure that the current kernel version supports the actions of the filter. This can be checked by inspecting the output of: cat /proc/sys/kernel/seccomp/actions_avail or by calling the seccomp(SECCOMP_GET_ACTION_AVAIL) syscall.

Are there any examples of using the second method in practice (seccomp(SECCOMP_GET_ACTION_AVAIL) syscall)? It seems like seccompiler does not expose any way to do this (would be nice if it did but maybe out of scope?), so it seems like I have to either:

  1. stitch different libraries together, one for making syscalls, and libc to get SECCOMP_GET_ACTION_AVAIL, or
  2. write the low-level code manually

If you know of any code that already does this it would save me time, and it could be a useful addition to the docs. :)

Use of BTreeMap can cause inadvertant loss of syscall rules when syscalls duplicated

In the case where the SeccompFilter rules are generated from a list of (syscall, Vec<SeccompRule) (as in the doc examples), if a syscall is repeated with different args, the later rule will silently overwrite the earlier one. This is a limitation of the Vec->BTreeMap collect implemenation. Given that rules is converted into an iterator over (syscall, Vec<SeccompRule) during try_from the use of a map doesn't seem to add anything.

This has implications for ease-of-use of the library. I came across this issue when implementing a port of OpenBSD pledge(). In pledge() semantics, promises are additive; cpath adds the ability to call open() with O_CREAT, and wpath allows open() with O_WRONLY; to open and then write a file you would need to specify both.

The simplest way to implement this is a separate whitelist for each case, and then chain them together as needed, e.g:

let cpath = vec![(
    libc::SYS_open,
    vec![
        Rule::new(vec![
            Cond::new(
                1,
                ArgLen::Dword,
                CmpOp::MaskedEq(libc::O_CREAT as u64),
                libc::O_CREAT as u64,
            )?])?,
    ])];


let wpath = vec![(
    libc::SYS_open,
    vec![
        Rule::new(vec![Cond::new(
            1,
            ArgLen::Dword,
            CmpOp::MaskedEq(libc::O_ACCMODE as u64),
            libc::O_WRONLY as u64,
        )?])?,
    ],
)];

// This list would actually be based on caller parameters
let rules: Vec<(i64, Vec<Rule>)> = cpath.into_iter()
    .chain(wpath)
    .collect();

let sf = SeccompFilter::new(
    rules.into_iter().collect(),
    Action::KillProcess,
    Action::Allow,
    ARCH.try_into()?,
)?;

However the use of BTreeMap means the cpath rule would be completely overwritten by the wpath rule, and attempts to create the file would fail.

The simplest solution would be to use a Vec for the syscall rule list (i.e. just skip the BTreeMap conversion) and leave the partitioning of the syscall rules up the caller. I've already tried this with v0.3.0 and it works fine; I'm happy to raise a PR. However it would obviously be a breaking change for current users of the lib.

Support for SECCOMP_FILTER_FLAG_TSYNC

Hi! I'm the author of extrasafe, a Rust wrapper around seccomp (and soon landlock as well, hopefully). I'm looking to switch from libseccomp-rs to seccompiler, mostly to make static compilation easier.

libseccomp supports the SECCOMP_FILTER_FLAG_TSYNC flag, which is a flag you can pass when calling the seccomp syscall directly. It allows you to apply the current seccomp filter to all running threads (TSYNC = thread sync).

Libseccomp achieves this by calling the seccomp syscall directly. It seems that seccompiler uses prctrl to enable seccomp, so in addition to adding a new flag to seccompiler::apply_filter, it would also need to be modified to call the syscall itself rather than using prctl.

If you'd be open to accepting a patch I'd be glad to make it - maybe just extracting the body of apply_filter into a new function apply_filter_with_flags, changing it to use the seccomp syscall, and then having apply_filter just proxy to apply_filter_with_flags with empty flags.

If you have a better design or don't want to support it at all, that's fine, just let me know!


Just for reference (mostly for me), here's a convenient link to the seccomp syscall manpage

And here's the libseccomp code that calls the seccomp syscall directly, passing in the flags. See the few lines above it for where the flags are set: https://github.com/seccomp/libseccomp/blob/f1c3196d9b95de22dde8f23c5befcbeabef5711c/src/system.c#L414

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.