GithubHelp home page GithubHelp logo

marginresearch / cannoli Goto Github PK

View Code? Open in Web Editor NEW
508.0 13.0 31.0 4.76 MB

High-performance QEMU memory and instruction tracing

License: GNU General Public License v2.0

Rust 96.86% C 0.88% Nix 0.93% Assembly 1.29% Gnuplot 0.04%

cannoli's Introduction

Cannoli Mascot!

Cannoli

Cannoli is a high-performance tracing engine for qemu-user. It can record a trace of both PCs executed, as well as memory operations. It consists of a small patch to QEMU to expose locations to inject some code directly into the JIT, a shared library which is loaded into QEMU to decide what and how to instrument, and a final library which consumes the stream produced by QEMU in another process, where analysis can be done on the trace.

Cannoli is designed to record this information with minimum interference of QEMU's execution. In practice, this means that QEMU needs to produce a stream of events, and hand them off (very quickly) to another process to handle more complex analysis of them. Doing the analysis during execution of the QEMU JIT itself would dramatically slow down execution.

Cannoli can handle billions of target instructions per second, can handle multi-threaded qemu-user applications, and allows multiple threads to consume the data from a single QEMU thread to parallelize processing of traces.

Is it fast?

Graph showing 2.2 billion instructions/sec

Performance with a single QEMU thread running the benchmark example on a Intel Xeon Silver 4310 @ 2.1 GHz, target is mipsel-linux, hot loop of unrolled nops to benchmark PC tracing bandwidth (worst case for us)

Example symbolizer

For an example, check out the symbolizer! Here's the kind of information you can get!

Example symbolizer showing memory accesses and PC executions

TL;DR Getting it running

Build Cannoli

git clone https://github.com/MarginResearch/cannoli
cd cannoli
cargo build --release

Checkout QEMU

git clone https://gitlab.com/qemu-project/qemu.git

Switch to the current QEMU branch we're working on

git checkout 78385bc738108a9b5b20e639520dc60425ca2a5a

Apply patch from qemu_patches.patch

cd qemu
git am --3way </path/to/cannoli>/qemu_patches.patch

Build QEMU for your desired targets (example mipsel and riscv64)

./configure --target-list=mipsel-linux-user,riscv64-linux-user --extra-ldflags="-ldl" --with-cannoli=</absolute/path/to/cannoli>
make -j48

Try out the example symbolizer

In one terminal, start the symbolizer

cd examples/symbolizer
cargo run --release

In another terminal, run the program in QEMU with Cannoli!

cd examples/symbolizer
</path/to/qemu>/build/qemu-mipsel -cannoli </path/to/cannoli>/target/release/<your jitter so>.so ./example_app

Using the Nix flake

If you have Nix, getting a build of qemu with the Cannoli patches is just:

nix --experimental-features 'nix-command flakes' build

If desired, you may also skip building and download pre-built binaries:

do you want to allow configuration setting 'extra-substituters' to be set to 'https://cannoli.cachix.org' (y/N)? y
do you want to allow configuration setting 'extra-trusted-public-keys' to be set to 'cannoli.cachix.org-1:nFKY7lRczFkkHacy6/OlfmpOU22MeEiDo90YV0QkVoQ=' (y/N)? y

The supported emulators will be in ./result/bin after a few minutes of compiling.

There's also a nix devshell, which you can use to populate a temporary shell with Cannoli and the supported rust toolchain, for development of applications that consume Cannoli traces:

nix --experimental-features 'nix-command flakes' develop

Coverage Example

Cannoli can be used to get coverage of binary applications for pretty cheap. There's an example provided that uses terrible symbol resolution, but it gives you a rough idea of what you can do

Build and run the client:

cd cannoli/examples/coverage
cargo run --release

Invoke QEMU on a binary you want coverage of, using the coverage hooks

QEMU_CANNOLI=cannoli/target/release/libcoverage.so qemu/build/qemu-x86_64 /usr/bin/vlc

This should work even for large, many-threaded applications! The coverage is self-silencing, meaning it will disable reporting of coverage (in the JIT) by patching itself out once it executes for the first time. You might get events in the future for the same callbacks due to re-JITting of the same code, but it's just meant to be a major filter to cut down on the traffic that you would otherwise get will full tracing.

What to do

  1. Create an application using the cannoli library to process traces by implementing the Cannoli trait (see one of the examples)
  2. Create a library using the jitter library to filter JIT hooks by implementing hook_inst and hook_mem, this must be a cdylib that produces the .so that you pass into QEMU with -cannoli. For a basic example of this that hooks everything, see jitter_always
  3. Run your trace-parsing application
  4. Launch QEMU with the -cannoli argument, and a path to the compiled <jitter>.so that you built!

User-experience

To start off, we should cover what you should expect as an end-user.

QEMU patches

As a user you will have to apply a small patch set to QEMU, consisting of about 200 lines of additions. These are all gated with #ifdef CANNOLI, such that if CANNOLI is not defined, QEMU will build identically to having none of the patches in the first place.

The patches aren't too relevant to the user, other than understanding that they add a -cannoli flag to QEMU which expects a path to a shared library. This shared library is loaded into QEMU and is invoked at various points of the JIT.

To apply the patches, simply run something like:

git am qemu_patches.patch

Jitter

The shared library which is loaded into QEMU is called the Cannoli Jitter.

Using this library expects two basic callbacks to be implemented, such that QEMU knows when to hook, and how to hook, certain operations. This is the filter mechanism that prevents JIT code from being produced in the first place if you do not want to hook literally everything.

use jitter::HookType;

/// Called before an instruction is lifted in QEMU.
///
/// The `HookType` dictates the type of hook used for the instruction, and may
/// be `Never`, `Always`, and `Once`
///
/// This may be called from multiple threads
#[no_mangle]
fn hook_inst(_pc: u64, _branch: bool) -> HookType {
    HookType::Always
}

/// Called when a memory access is being lifted in QEMU. Returning `true` will
/// cause the memory access to generate events in the trace buffer.
///
/// This may be called from multiple threads
#[no_mangle]
fn hook_mem(_pc: u64, _write: bool, _size: usize) -> bool {
    true
}

These hooks provide an opportunity for a user to decide whether or not a given instruction or memory access should be hooked. Returning true (the default) results in instrumenting the instruction. Returning false means that no instrumentation is added to the JIT, and thus, QEMU runs with full speed emulation.

This API is invoked when QEMU lifts target instructions. Lifting in this case, is the core operation of an emulator, where it disassembles a target instruction, and transforms it into an IL or JITs it to another architecture for execution. Since QEMU caches instructions it has already lifted, these functions are called "rarely" (with respect to how often the instructions themselves execute), and thus this is the location where you should put in your smart logic to filter what you hook.

If you hook a select few instructions, the performance overhead of this tool is effectively zero. Cannoli is designed to provide very low overhead for full tracing, however if you don't need full tracing you should filter at this stage. This prevents the JIT from being instrumented in the first place, and provides a filtering mechanism for an end-user.

Cannoli "client"

Cannoli then has a client component. The client's goal is to process the massive stream of data being produced by QEMU. Further, the API for Cannoli has been designed with threading in mind, such that a single thread can be running inside qemu-user, and complex analysis of that stream can be done by threading the analysis while getting maximum single-core performance in QEMU itself.

Cannoli exposes a standard Rust trait-style interface, where you implement Cannoli on your structure.

As an implementer of this trait, you must implement init. This is where you create a structure for both a single-threaded mutable context (Self), as well as a multi-threaded shared immutable context (Self::Context).

You then optionally can implement the callbacks for the Cannoli trait.

These callbacks are relatively self-explanatory, with the exception of the threading aspects. The three main execution callbacks exec, read, and write can be called from multiple threads in parallel. Thus, these are not called sequentially. This is where stateless processing should be done. These also only have immutable access to the Self::Context, as they run in parallel. This is the correct location to do any processing which does not need to know the ordering/sequence of instructions or memory accesses. For example, applying symbols where you convert from a pc into a symbol + address should be done here, such that you can symbolize the trace in parallel.

All of the main callbacks (eg. exec) provide access to a trace buffer. Pushing values of type Self::Trace to this buffer allow you to sequence data. Pushing events to this buffer allows them to be viewed in-execution-order when the trace is processed in the trace() callback.

This trace is then exposed back to the user fully in-order via the trace callback. The trace callback is called from various threads (eg. you might run in a different TID), however, is it ensured to always be called sequentially and in-order with respect to execution. Due to this, you get mutable access to self, as well as a reference to the shared Self::Context.

I know this is a weird API, but it effectively allows parallelism of processing the trace until you absolutely need it to be sequential. I hope it's not too confusing for end users, but processing 2 billion instructions/second of data kind of requires threading on the consumer side, otherwise you bottleneck QEMU!

cannoli's People

Contributors

evanrichter avatar gamozolabs avatar johndoe31415 avatar novafacing avatar palleiko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cannoli's Issues

[Discussion/Feature Request] Viability of Cannoli on qemu-system

A little bit of background here:
I am currently trying to port over Cannoli to XEMU, which is running a custom infrastructure for QEMU 6 in order to emulate the original Xbox. During the efforts I made porting the patches over, I noticed a comment that specifically mentioned that while Cannoli could run on qemu-system, it shouldn't as it would be pointless, due to loss of granularity.

Could you elaborate on what exactly is lost in the switch from qemu-user to qemu-system? And what would it take to actually get feature parity for Cannoli on qemu-system, if at all possible?

Futur Windows Implementation ?

very insane project.
is it possible to have in the futur a windows Implementation ?
or if it is possible to emu PE with the help of wine.

quemu_patches.patch fails

git am --3way ../cannoli/qemu_patches.patch 
Applying: Synced with 742848ad987b27fdbeab11323271ca7d196152fb
Applying: Style cleanup, more comments
Applying: Added PC support to memops
Applying: Updated path
Applying: Added --with-cannoli build flag
error: sha1 information is lacking or useless (include/tcg/tcg.h).
error: could not build fake ancestor
Patch failed at 0005 Added --with-cannoli build flag

additional hooks

Cool project! With a bit of tooling on top, I'll probably be able to replace many of my use cases for usercorn with a tool that works on more complex targets.

There are a few hooks I've found valuable to get a complete picture with this kind of tracing:

  • syscall (# + arg registers) - you can just emit a trace event in do_syscall()
  • mmap / munmap / mprotect (if a file is mmaped, I'd like enough information to best-effort mirror the mapping into a tracing tool. filename+offset may be sufficient for most cases? I'd also likely want to know about the initial mappings of the interpreter and executable.)
  • simple register change (e.g. r0, eax, etc)
  • special register change (e.g. MSR, SIMD)

Register change tracking is the reason I've wanted something more like cannoli for a long time - it would be so much faster to copy individual register writes to a buffer within the JIT, than what I was doing before (diff the register file repeatedly from a C helper)

cargo build error

**cargo --version
cargo 1.61.0 (a028ae4 2022-04-29)

Linux CentOS7 X86-64**

==============================

Compiling xcursor v0.3.4
Compiling cexpr v0.6.0
Compiling gpu-descriptor v0.2.2
Compiling env_logger v0.9.0
Compiling mempipe v0.1.0 (/data_sdd/qemu_highPerf/cannoli/mempipe)
error[E0432]: unresolved import alloc::ffi
--> mempipe/src/lib.rs:55:12
|
55 | use alloc::ffi::{CString, NulError};
| ^^^ could not find ffi in alloc

error[E0554]: #![feature] may not be used on the stable release channel
--> mempipe/src/lib.rs:29:1
|
29 | #![feature(maybe_uninit_uninit_array, array_from_fn)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: #![feature] may not be used on the stable release channel
--> mempipe/src/lib.rs:30:1
|
30 | #![feature(inline_const, alloc_c_string)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: #![feature] may not be used on the stable release channel
--> mempipe/src/lib.rs:29:12
|
29 | #![feature(maybe_uninit_uninit_array, array_from_fn)]
| ^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0554]: #![feature] may not be used on the stable release channel
--> mempipe/src/lib.rs:29:39
|
29 | #![feature(maybe_uninit_uninit_array, array_from_fn)]
| ^^^^^^^^^^^^^

Can cannoli be used to do the instruction level tracing for a program?

Hi, I want to make the instruction level tracing for a program, which should include the instruction address, opcodes, memory reads, register reads, thread, module. Can I modify cannoli to achieve this goal?
Second question is: does cannoli affects the stack/heap layout of dest program? Because I want to record the execution of exploits for vulnerabilities, if the heap/stack layout changes, the exploits may not work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.