sid-agrawal / osmosis Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 28.89 MB

Shell 6.07% Python 92.95% Dockerfile 0.98%

osmosis's Introduction

CellulOS: An implementation of the OSmosis model.

The details of the OSmosis model are available here and here. A wiki of CellulOS is available here

Setup the dev machine

Instructions copied verbatim from sel4test.

The basic build package on Ubuntu is the build-essential package. To install run:

sudo apt-get update
sudo apt-get install build-essential

Additional base dependencies for building seL4 projects on Ubuntu include installing:

sudo apt-get install cmake ccache ninja-build cmake-curses-gui
sudo apt-get install libxml2-utils ncurses-dev
sudo apt-get install curl git doxygen device-tree-compiler
sudo apt-get install u-boot-tools
sudo apt-get install python3-dev python3-pip python-is-python3
sudo apt-get install protobuf-compiler python3-protobuf

Simulating with QEMU

In order to run seL4 projects on a simulator you will need QEMU:

sudo apt-get install qemu-system-arm qemu-system-x86 qemu-system-misc

Cross-compiling for ARM targets

To build for ARM targets you will need a cross compiler:

sudo apt-get install gcc-arm-linux-gnueabi g++-arm-linux-gnueabi
sudo apt-get install gcc-aarch64-linux-gnu g++-aarch64-linux-gnu
# (you can install the hardware floating point versions as well if you wish)

sudo apt-get install gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf

Setup a new workspace

# Clone OSmosis and all the submodules
git clone --recurse-submodules [email protected]:sid-agrawal/OSmosis.git
cd OSmosis
# Make sure that the cellulos branch is checked out
git submodule foreach git checkout cellulos
git status # This should show no chanages, as all the commits should be on the cellulos branch

Build & Run

Qemu

mkdir build
cd build
../init-build.sh -DAARCH64=TRUE  -DPLATFORM=qemu-arm-virt -DSIMULATION=TRUE -DDEBUG=TRUE
ninja
./simulate

Odroid C4

mkdir build
cd build
 ../init-build.sh -DAARCH64=TRUE  -DPLATFORM=odroidc4 -DDEBUG=TRUE
ninja
# Look at notion for steps on how to copy the binary to the board via TFTP

Running with SMP Enabled

Build Arguments

../init-build.sh -DAARCH64=TRUE -DPLATFORM=qemu-arm-virt -DSIMULATION=TRUE -DSMP=TRUE -DDEBUG=TRUE This will enable 4 cores by default, pass in -DKernelMaxNumNodes=<CORES> to change this

SMP with QEMU

If running on WSL, in the config files, give it at least 8GB in RAM (otherwise tests won't run at all) and at least 4 virtual processors (otherwise it will run very slowly).
invoke ./simulate -m 8G, with 8G as a minimum. QEMU is run with 4 cores by default, pass in -smp <CORES> to change this.

Generate new compile commands

Compile commands file is used for code navigation. This workspace's vscode settings file is configured to use it.

cd build
bear --output ../compile_commands.json -- ninja

Typical workflow

Let's follow rules to make our lives easier:

All the submodules are using a fork maintained by sid-agrawal.
All OSmosis commits go the celluos branch for every module, including the parent OSmosis repo.
Let's not push code to submodules that we do not reflect in OSmosis repo yet. In other words let's keep them in sync.
- Using git push --recurse-submodules=on-demand should make enforce this. More on this below.

Commit your changes

TLDR; Commit and push individual sub-modules first, and then do the same in the parent repo.

Set up this alias once. This alias will get added to your repo-local .git/config

git config alias.supercommit '!./supercommit.sh "$@"; #'

| Note: This will add and commit everything, which may be you do not want sometimes.

Then to commit do:

git supercommit "some message"

cat ./supercommit.sh
#!/bin/bash -e
if [ -z "$1" ]; then
    echo "You need to provide a commit message"
    exit
fi

git submodule foreach "
    git add -A .
    git update-index --refresh
    commits=\$(git diff-index HEAD)
    if [ ! -z \"\$commits\" ]; then
        git commit -am \"$1\"
    fi"

git add -A .
git commit -am "$1"

Push all changes

Read the Publishing submodules section here.

It will push the files and the modules refs from OSmosis repo, and if it sees that a particular module ref is not yet pushed, it will push that too.

git push --recurse-submodules=on-demand

Bring in new changes

I am fairly certain this should be okay, but we will see.

# bring in new refs for submodules
git pull --rebase
# Update the code in the modules, if there is a conflict with local, this should complain.

# Then Resolve conflicts, supercommit
[...]
# Push
git push --recurse-submodules=on-demand

Building and Viewing the Doxygen Documentation

Run the following command from the root OSmosis folder:

doxygen Doxyfile

A doxygen directory should've been created, with the html and latex versions of the documentation. Only contents of sel4gpi have been configured for doxygen.

For configuration, see the official doxygen documentation.

osmosis's People

Contributors

Stargazers

Watchers

osmosis's Issues

RSI: Does number of resource types in the 2 PDs being compared needs to be the same

Why do we need |R1| == |R2|?

RSI: Consider that PD1’s resources are a subset of PD2’s resources. Now, we can alter the number of resources PD2 has. This suggests that the RSI for the two PDs is different, but from PD1’s perspective, all its resources will be shared.

Port OpenSSL

Try and get this running.

/docs/man3.0/man7/crypto.html (openssl.org)

Used in the evals of both VDOM (ASPLOS 2023) and LWC (OSDI 2016)

Questions based on a cursory look:

What is fetch does it need network access?

Model multiple VMM types

To start, we will model

OSmosis's VMM, which is built on top of the microkit VMM
KVM
Xen
Light VM

Look at sel4-debug from Cantrip

That intercepts the context switch (somehow via Renode) and then switches symbol tables.
It may or may not be possible to do this in Qemu, but we need to see.

There was also a mention of capscan, which dumps all the caps in Code.

https://github.com/AmbiML/sparrow-cantrip-full/

Have execution-context as an entity in the model

Perhaps stating that PD is an active entity in the system was not the right phrasing, which threw off the readers. So, it might be more fitting to have a notion of an execution-context as the active entity. This EC can be attached to a PD to get resources.

This way, two threads are ECs that use the same PD.
An LWC, is just one EC switch between two PDs.

Combine Root-Server and seL4 kernel

The two can be combined like in Genode to improve performance and reduce the number of IPC between Root-Task and seL4 kernel.

This is also related to
#5

Multiple Questions with the Benchmarking Setup

IPC Test Config

How does n_ter fit in with 500 runs of the test
- n_iter is about reboot and be changed from python
- 500 is about number of reruns w/o reboot and is (for now) based on a macro

Basic Test Config

why did “basic_test_configurations” not print out immediately?
The run for basic fails with:

Traceback (most recent call last):
  File "/home/siagraw/OSmosis/scripts/bench/venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 939, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/home/siagraw/OSmosis/scripts/bench/venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 986, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 11 columns passed, passed data had 550 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/siagraw/OSmosis/scripts/bench/run_benchmarks.py", line 736, in <module>
    df_new = pd.DataFrame.from_records(results, columns=columns)
  File "/home/siagraw/OSmosis/scripts/bench/venv/lib/python3.10/site-packages/pandas/core/frame.py", line 2491, in from_records
    arrays, arr_columns = to_arrays(data, columns)
  File "/home/siagraw/OSmosis/scripts/bench/venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 845, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/home/siagraw/OSmosis/scripts/bench/venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 942, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 11 columns passed, passed data had 550 columns

Toy Cleanup

High Variance in toy server cleanup. See xls

https://ubcca-my.sharepoint.com/:x:/g/personal/siagraw_student_ubc_ca/EXf9EliE_9FCsoI15uOAfOwB6mXgkDvnx_4trfFJG7VyaQ?e=12kbBq

Cleanup

GMIBM0014 is couldn't be found

Resource Cleanup Design, impl and Eval

Have a Rust PD in CellulOS

This is not immediately needed, and neither is it clear why it is needed. But if we decide to do this, we can look here:

https://github.com/seL4/rust-sel4

It seems to example of

Root Task written in rust where we do not use microkit.
Other PDs written in rust where we do use microkit (I am assuming that the RootTask is non-rust)

Model Intra-kernel Compartmentalization

We should be able to model multiple PDs inside the kernel. The end goal is to:

Model these systems in OSmosis so that we can distinguish them.

To start, we will look at the following papers:

Some questions that are worth answering to help us understand these papers on a common plane:

What is a PD in the system?
- How do we switch PD?
- What is setting up the PD?
- What is enforcing the resources held by execution context, i.e., who is enforcing the notion of the specific PD?
- What are the resources in this PD?
How do the systems manage shared data and pointers?
Is the heap shared ?
- Is the shared heap allowed to have pointers? (Redleaf, LXD, LVD)
How do they draw out the isolation boundaries?
What are the limitations? (that probably requires human intervention)
How are the compartments initialized and cleaned up?
How do the compartments scale? (e.g., support arbitrarily many device drivers)
What tricks did they use to optimize performance?

Modelling: Cellular Disco Vs. BF Vs. Cluster

In a multikernel such as Barellfish, there are multiple copies of the kernel, but they all operate in the same
address space. This is slightly different from Cellular Disco, in which each kernel is operating in its separate
address space, but the underlying hardware is still shared. This is compared to a cluster of machines, in which the
kernel on each node operates in completely separate hardware resources.
We believe that OSmosis should be able to capture and distinguish between these scenarios.

Port Redis

Look at what/how Redis was run on Barellfish and see if we need to replicate that.

Model Permission on Hold Edges

This should help with comments about the difference between the code–page held by the PD and the same code page held by the kernel. Process-PD has R perms to the code page, and the kernel has map perms to the process code page.

We can also use this to capture whether the memory in the resource is interprested by the PD or not.

Notion of “interpreted” vs “non-interpreted” resources accesses. This is of interest when we talk about the fault radius. If a resource that can be written by PD1 that is being interpreted by another PD2, then a failure of PD1 may result in corruption of that resource and thus can lead to a crash in PD2 when it tries to interpret this. If it is non-interpreted then this is not an issue, e.g., an encryption service will happily encrypt the resource regardless of whether its data makes sense or not. Similarly, there seems to be a notion of the contents of a resource, vs. the address of a resource. The address of encrypted memory may be read but its contents not deciphered – with writing that would be another story. I think that may be expressible using permissions on hold edges.

submarine tooling

submarine is the tool for exploring the design space of isolation mechanisms.

This issue depends on the following tasks

Catching Up

Impl Things that Sid Needs to understand

Protobuf stuff setup/working.
Message Queues
All Test Cases
Performance Runs
Cap Tracking
Cap Exchange
Model State tracking

Creating and deleting resource servers

Infra Pieces

Basic Types

Diff between VMR and MO

/**
 * Resource type is either:
 * - Core type: one of the enum values defined for GPICAP_CORE_TYPE
 * - Dynamic type: dynamic value assigned by alloc_new_resource_type
*/
typedef enum GPICAP_CORE_TYPE
{
    // Core cap types
    GPICAP_TYPE_NONE = 0,
    GPICAP_TYPE_ADS,    ///< An address space
    GPICAP_TYPE_VMR,    ///< A virtual memory region
    GPICAP_TYPE_MO,     ///< A memory object
    GPICAP_TYPE_CPU,    ///< A CPU object, currently primarily a TCB
    GPICAP_TYPE_PCPU,   ///< Physical CPU core
    GPICAP_TYPE_PD,     ///< A PD
    GPICAP_TYPE_EP,     ///< An endpoint, this is not an actual OSmosis model resource type
    GPICAP_TYPE_RESSPC, ///< A resource space
    GPICAP_TYPE_seL4,   ///< An seL4 kernel object
} gpi_cap_t;

Resource Space Boot Strap Code

How is RS different from Component in the impls

How is RT Talking to various PD

Protobuf

Message Queues

Runtime Changes

Scenarios

VMM Scenario

Container Scenario

High Jump Scenario

Framework: Be explicit about the framework is.

It is not clear what the framework is, and here are some options.

Or the framework is some library / system with some hooks that an OS would need to implement to extract the model state? A bit like newlib has a bunch of functions that would need to be implemented by the OS designer.
Or the framework provides some mechanisms that an OS can use to realize the PDs given a set of hardware features available (e.g., MPK, VMX, SGX, ...). I could see this to be a support for doing function calls/IPC across PD boundaries.

I think it is both. (1) is what we use for comparing PDs and (2) is what we use to build a variety of PDs.

It can also merged into the impl section if we cannot clearly define what the framework is.

Modelling: Trusted Execution Environments

We need to ascertain that the OSmosis model can capture the isolation guarantees provided by a Trusted Execution Environment (TEE). In its most basic form, a TEE is supposed to provide confidentiality and integrity for the code and data running inside the TEE from the OS. We will examine Intel SGX (Software Guard Extensions) and ARM’s Trust Zones. Both implementations of the TEE concept are sufficiently different, and modeling just these two should suffice in determining if OSmosis can adequately model TEE. In SGX, the process data is confidential from the OS. An entirely isolated OS instance in the TZ is confidential from the primary OS.

The following should be clear from the model depiction:
• That the OS cannot access the data pages of the application
• A lower-level software (e.g., firmware in ARM) is still trusted.

This issue relates to the conversation about whether data is interpreted discussed in #41

RSI: Enhance with Access Patterns

Adopt RSI to capture access pattern?

Enhance RSI if there is intra-PD interaction via that resource.

I'm not sure what the formula for that might be.

If we want to keep numerical values to RSI, we need to devise an app-agnostic way of measuring how often the app is using the resource. For memory resources, something like PEBS could be used.

We could track resources provided by PDs in real-time.

However, an RSI value could change over time just based on access patterns, and I'm not sure how we will reason about that.

Model Cross PD Communication

Two PDs can communicate in many ways and the allowed methods are a function of their isolation.
We need to ensure that each of these communication methods can be modeled in OSmosis.

IPC via pipe
seL4 style IPC
Shared Memory (no intermediary)
Anything else ..

Multiple Communication Backends

System Topology Service

The topology information of a board contains information like how many cores are present and which RAM banks are closer to which cores, what is the cache config and what is the {VA/PA} mapping to a set of cache.

The topology service provides some of the RR, which is static, depends on the board, and, most importantly, does not allow the OS to change.

Given a PA return:
1. Caches sets it can be in a given CPU core
2. Which DRAM buses it can touch
Given a CPU Core return:
1. Which DRAM is near
2. Which other Cores does it share with it its
  1. L1 cache
  2. L2 cache
  3. L3 cache

Details of ODroid

Reason about FR when the ancestor PD is formally verified to be correct

One of the criticisms of the current model is that:

a. FR=1 for two procesess in seL4 with a correct caps setup
b. FR=2 for two processes in linux-KVM guests.

But folks would consider (a) to be more secure than (b). We need to somehow in the first case kernel is trusted, and in second case VMM is trusted. And since the kernel is formally verified, the FR path via kernel cannot read to bug exploits.

But what about resource exhaustion in the kernel? Is that a bug?

Model Devices

This is linked to the issue around impl of device drivers:

and to the modeling of HW as a PD in general

Different Types of PDs in the context of nested PDs

When we say that there are nested PDs (of guest processes) inside a VM or when we say that there are nested PDs (of MPK-based compartments) inside a process. We are saying two things:

CellulOS does not maintain the nested PD's abstraction of PD.
- In the VM case, the guest process abstraction is created by the guest OS
- In the MPK based process memory isolation case, the domain abstraction is created by the TCB in the process.
The component that needs to be compromised to circumvent this PD isolation is different.
- Every domain switch goes via the TCB which maintains the abstraction

Thus, do we need to create a notion in our model that every PD has a provider PD, the one that is creating that abstraction?
Maybe if we think of a PD as a memory resource, then the memory that stores that list of resources in the PD is the PD creator.

The PD in CellulOS is provided by the Root-Server
But the PD inside the process (MPK-based), let's call it PD', is provided by the TCB inside the process.

Popular Name	PD Definition	Who is doing setup	Where is config data stored	Who is enforcing at run time?
CSpaces	All cap accessible in CSpace	uKernel	kernel Data	User/Kernel Mode and MMU
Virtual Memory	All Mapped Page	uKernel + Memory Manager	PT	MMU
CHERI Cap	DAG of all caps accessible from regs	Kernel/App	Tag Bit	CHERI CPU (Tag Bit)
MPK	All pages which have the same key as the current pku_reg	Kernel + MMU + App	PT	MMU
ARM MD	All pages which have the same key as the current pku_reg	Kernel + MMU	PT	MMU
MTE		Kernel + MMU + App	PT + VSpace	MMU
Virtual Machine?	All Physical Resources	HV
M3	Hardware CSpace + Virtual Memory

This issue is closely related to #35

Model Intra-address space mechanisms

To start off, we will model

MPK
MD
MTE
CHERI
PAC - This relates to the previous task, too

Look at the following paper from ATC 2024 for a summary of all the features. In principle, this paper should have all the related work we will need to do this part of the project.

Limitations and Opportunities of Modern Hardware Isolation Mechanisms

Once a mechanisms is picked up, doing it depends on package the needed data in the ELF, for notes on that, look at

Should we use ASan ?

Discussed in #11

^{Originally posted by sid-agrawal June 12, 2024}
Given the recent bugs we have encountered related to buffer not being initialized and pointer errors, I was wondering if we will benefit from using something like https://github.com/google/sanitizers/wiki/AddressSanitizer in our code.

For now, even if we do not use it, let's use this thread to keep track of issues we find that could have been caught automatically using ASan.

We also need to check if our compiler has the option to enable ASan. gcc is supposed to have it.

@astevins @p-linh

Get OSmosis State from Linux using `/proc`

Get the OSmosis state from Linux for the memory resource.

To start, we can look at pagemap to get VA-->PA mapping and /proc/PID/pmap to extract memory related states
This is linked to #25.

Subtasks:

[] Run a Linux image that we compiled and run it with out VMM
[] Figure out a way for the VM to communicate with the VMM
- [] Shared memory
- [] hypercall

Run queries on model state at runtime

This directly depends on #29, if we do that issue then look into how to run simple queries at runtime

Port sDDF to OSmosis

Get the different drivers from sDDF working on OSmosis.
This is linked to #27 which is about the modeling aspect.

Objective

Understand how the sDDF example works and get them running on our odroid-c4 board.
1. This involves understanding the microkit too.
Port these driver examples to the OSmosis repo.
1. The focus here isn't to use any special things from OSmosis’ framework but to make sure it runs in our code. The way a new process is created in microkit Vs. OSmosis(which is just seL4test for now) is different.
(More research), what building blocks can we build in OSmosis that are more device-specific.
First get the Qemu version running and see if they are enough to answer our questions.

Notes

odroidc4 DNS [sid-odroid.cs.ubc.ca](http://sid-odroid.cs.ubc.ca/), ip 198.162.52.47

Resources

sDDF

Check out the slides and video about the “seL4 Device Driver Framework (sDDF)”, we want to leverage as much of that as possible

[slides from seL4 summit 2023](https://sel4.systems/Foundation/Summit/2022/slides/d1_06_The_seL4_Device_Driver_Framework_(sDDF)_Lucy_Parker.pdf)
[Video of the same slides](https://www.youtube.com/watch?v=gDXsnBhNiQM)
Their source code Source Code:
- Updated https://github.com/au-ts/sDDF

microkit

sDDF is built on top of seL4’ mircokit, so it is a good idea to look at that too. In a nutshell, microkit is a way to create static systems on top of seL4, and since we are making dynamic systems it doesn’t work for us.

https://github.com/seL4/microkit/blob/main/docs/manual.md

microkit http server demo (in Rust)

This example has a virtio network driver, though it is in rust and can be used as an example if we want to do it in C

https://github.com/seL4/rust-microkit-http-server-demo

Other Generic Resources

Here are some other related resources for the ethernet and lwip stuff

Tutorials for driver

Protocol

[lwIP - Wikipedia](https://en.wikipedia.org/wiki/LwIP)

[lwIP: System initalization (nongnu.org)](https://www.nongnu.org/lwip/2_1_x/sys_init.html)

[picotcp | PicoTCP is a free TCP/IP stack implementation (altran.be)](http://picotcp.altran.be/)

[Writing Virtio Drivers — The Linux Kernel documentation](https://docs.kernel.org/driver-api/virtio/writing_virtio_drivers.html)

Older seL4 drivers

The [seL4 page on available components say](https://docs.sel4.systems/projects/available-user-components.html)s that has support ethernet drivers and they are inside [camkes project](https://docs.sel4.systems/projects/camkes/)

Looks like the user app is a camkes component
- Ethdriver in ~/sel4/camkes-project/projects/camkes/apps/
- see here on how to run this: https://docs.sel4.systems/projects/camkes/
EthernetDriver:
- [projects/global-components/components/Ethdriver/README.md](https://github.com/seL4/global-components/tree/master/components/Ethdriver)
- E1000 driver is there. Intel e1000 is a family which consists of 82580 and 82574
picotcp in sel4: projects/picotcp
From the original src picoTCP:
- https://github.com/tass-belgium/picotcp/wiki/Example-device-driver
- [Device Drivers · tass-belgium/picotcp Wiki (github.com)](https://github.com/tass-belgium/picotcp/wiki/Device-Drivers)
pico_eth_send
- eth_driver->i_fn.raw_tx (which for intel is intel.c:raw.tx)
Also look at:
- projects/global-components/remote-drivers/picotcp-ethernet-async

Modelling CGroups

How would be model cgroups in our model?

This will require us to model quotas of CPU time and memory (in bytes). So far we have not done this, and have mainly focused on specific CPU and memory resources.

Model Address Space Compz

We need to show more evidence that the model can capture

Intel MPK
ARM MD
MTE
CHERI
SpaceJMP and LWC (What we have in the current verion is not enough)

OSDI TODOs

Make a list of things that we want to convince the reader off.
Look at the current para-level outline and see how it needs to be tweaked.

Formally Specify the constraints on each element in the model

Track/Find the memory storing model elements?

If the memory region that stores the mapping edge can be compromised, the mapping between resources can be changed. This is perhaps true for all the edges in our model.

So, find where the memory is for each node and edge type, and see which PDs can RW to it.

Some examples of memory that stores the edge information

Edge Type	Example
`map`	PTE+VAS, Inode
`hold`	CNode+`pd_t`
`request`	`pd_t`
`subset`	resource-server's memory

Explicitly showcase multiple resource types

Model more resources

HW:
- Cache sets
- Branch pred: Or show how we cannot model it w/o xyz information.
- Exception handler (not sure what that means)
SW:
- Inode,
- PTEs #38

Modelling: Differences between μ-kernel and monolithic kernel

The kernel itself is modeled as a protection domain in the OSmosis model, therefore the differences between
a μ−kernel and monolithic kernel should be visible in the model state. For instance, the VAS (virtual address
space) resource should be managed by the kernel in the case of a monolithic kernel and by a memory-management
protection domain in the case of a μ−kernel.

I think this is closely linked to two issues around

#41 and #38

Phrase Model Queries

Convert Metric to Queries?

Comparing an RSI of 0.75 to 0.95 will always be hairy as long as we do not try to characterize what the PD is doing with the resource.

The argument that since we are comparing the same 2 applications in different scenarios and hence the shared resources will have similar access patterns is not coming across.

I personally feel that numerical values for RSI are not the way to go, as that is a can of worms we do not want to open. I suggest that we go back to Queries on the model state. I can present some options in an OS meeting.

Look at older notes on Notion and add some same queries here.

Port webserver based on micropython

The latest LionOS demo has an example NFS client and a Python website based on a micropython (bare-metal Python). If we need it for eval, port this code.

https://lionsos.org/docs/kitty/

func
domain-switch
TCP
shared-mem

Relevant papers

Flick: A Flexible, Optimizing IDL Compiler
Flounder
nanopb ?: We use nanoPB to make IPC easier, can it not be used to make a wrapper easier?