GithubHelp home page GithubHelp logo

doytsujin / nabla-linux Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nabla-containers/nabla-linux

0.0 2.0 0.0 254 KB

Experimental Linux Virtual Machine based on UML and noMMU

Makefile 11.11% C 56.63% Dockerfile 2.87% Python 10.28% Shell 8.97% C++ 10.03% Assembly 0.11%

nabla-linux's Introduction

This is an experimental type of Linux Virtual Machine that does not use a hypervisor (no monitor, no emulation, no HW virtualization). A guest runs multiple processes on the same address-space as a single host process on top of 12 syscalls (sandboxed using seccomp). The guest kernel is a modified Linux configured with User-Mode-Linux (UML) and no-MMU.

The best way to see this is as a modified User-Mode-Linux (UML) that is faster and more secure, but is significantly less general-purpose.

IMAGE ALT TEXT HERE

Try it with this one-liner

docker run --rm -it kollerr/linux-um-nommu

The source dockerfile is tests/docker/linux-um-nommu/Dockerfile which uses the alpine-test.ext3 image built by tests/Makefile (which is then based on tests/docker/alpine/Dockerfile).

Introduction

Container runtimes have been using virtualization as a way of improving isolation (e.g., Kata containers). And in order to make them feel like regular containers, the community has been trying to slim down their virtual machine (VM) monitors (e.g., Firecracker). This experimental "VM" is what happens when you slim down to the extreme: no monitor at all.

Nabla Linux is a Linux virtual machine that runs as a single unprivileged user-level process on top of only 12 syscalls. We achieve isolation equivalent to virtual machines, without using a monitor, by restricting the VM process to only these 12 system calls using seccomp. The system was built on top of a combination of two well known Linux features: user mode linux (UML) and no-MMU support (used for embedded devices) both in the kernel and in userspace (musl and busybox).

nabla-linux

Our initial experiments show that this Linux VM is capable of running multiple unmodified binaries from Alpine (like python, nginx, redis), and can boot in 6 milliseconds (to our knowledge, this is the fastest); albeit with some limitations: PIE executables only, and no forks (processes are emulated using vforks).

Demo:

asciicast

This shows a run with the host syscalls on the left. The point of this is to show that lots of applications just work while running on a small set of syscalls.

Build and test

A single make at the root should build linux, musl, and busybox. Then you need a disk image (think of this as a VM). You can create a raw disk file based on alpine using the alpine-test.ext3 target in tests/, or just do a make demo in tests/ which will build one and then run it.

make
cd tests && make demo

Related projects

  • Linux Kernel Library (LKL) which also uses the NOMMU config but has a different use case: to be used as a library instead of a "VM". There are two very interesting developments related to LKL:

  • Gvisor which looks like UML when running in ptrace mode (one host process per guest process trapped using ptrace).

  • The solo5-spt monitor which runs unikernels as a single process sandboxed using seccomp (same idea).

Limitations

  • No virtual memory (VM) and no memory protection. A single address space is shared by multiple processes, so a process writing into the NULL page will "kill" every process running in the VM (not what you would expect).
  • No sys_fork. Which is partially solved by supporting vfork (and posix_spawn). The catch is that applications need to use vfork or posix_spawn instead of fork and exec (like busybox configured for NOMMU). Applications doing sys_fork will get an EINVAL. The most common usage of fork and exec (running a new program) is the shell: that's why we need busybox configured for NOMMU. Other applications like nginx or redis don't fork (haven't seen them fork at least), so they don't need to be patched.
  • Can only run PIE executables. This is the case for most of the binaries in Alpine Linux as explained here/Secure.
  • Have to use our modified musl libc. This libc supports making syscalls over vsyscall (i.e. a function call instead of the syscall instruction).

nabla-linux's People

Contributors

ricarkol avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.