GithubHelp home page GithubHelp logo

blockspacer / libriscv Goto Github PK

View Code? Open in Web Editor NEW

This project forked from fwsgonzo/libriscv

0.0 2.0 0.0 1.36 MB

C++17 RISC-V RV32GC / RV64G userspace emulator library

CMake 5.11% C++ 74.69% Shell 1.01% C 16.37% Python 0.36% Dockerfile 0.37% HTML 1.66% Zig 0.13% VCL 0.29%

libriscv's Introduction

RISC-V userspace emulator library

Demonstration

You can find a live demonstration of the library here: https://droplet.fwsnet.net

Here is a multi-threaded test program: https://gist.github.com/fwsGonzo/e1a9cdc18f9da2ffc309fb9324a26c32

Benchmarks against LuaJIT

https://gist.github.com/fwsGonzo/f874ba58f2bab1bf502cad47a9b2fbed

Installing a RISC-V GCC embedded compiler

$ xpm install --global @xpack-dev-tools/riscv-none-embed-gcc@latest

See more here: https://gnu-mcu-eclipse.github.io/install/

Installing a RISC-V GCC linux compiler

To get C++ exceptions and other things, you will need a (limited) Linux userspace environment. You will need to build this cross-compiler yourself:

git clone https://github.com/riscv/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
./configure --prefix=$HOME/riscv --with-arch=rv32g --with-abi=ilp32d
make -j4

This will build a newlib cross-compiler with C++ exception support. The ABI is ilp32d, which is for 32-bit and 64-bit floating-point instruction set support. It is much faster than software implementations of binary IEEE floating-point arithmetic.

Note that if you want a full glibc cross-compiler instead, simply appending linux to the make command will suffice, like so: make linux. Glibc is harder to support, and produces larger binaries, but will be more performant. It also supports threads, which is awesome to play around with.

git clone https://github.com/riscv/riscv-gnu-toolchain.git
cd riscv-gnu-toolchain
./configure --prefix=$HOME/riscv --with-arch=rv64g --with-abi=lp64d
make -j4

The incantation for 64-bit RISC-V.

Building and running a test program

From one of the binary subfolders:

$ ./build.sh

Which will produce a hello_world binary in the sub-projects build folder.

Building the emulator and booting the newlib hello_world:

cd emulator
mkdir -p build && cd build
cmake .. && make -j4
./rvnewlib ../../binaries/newlib/build/hello_world

The emulator is built 3 times for different purposes. rvmicro is built for micro-environments with custom heap and threads. rvnewlib has hooked up enough system calls to run newlib. rvlinux has all the system calls necessary to run a normal userspace linux binary.

Building and running your own ELF files that can run in freestanding RV32GC is quite challenging, so consult the barebones example! It's a bit like booting on bare metal, except you can more easily implement system functions. The fun part is of course the extremely small binaries and total control over the environment.

The newlib example project have much more C and C++ support, but still misses things like environment variables and such. This is a deliberate design as newlib is intended for embedded development. It supports C++ RTTI and exceptions, and is the best middle-ground for running a fuller C++ environment that still produces small binaries.

The full example project uses the Linux-configured cross compiler and will expect you to implement quite a few system calls just to get into int main(). In addition, you will have to setup argv, env and the aux-vector. There is a helper method to do this in the src folder. There is also basic pthreads support.

And finally, the micro project implements the absolutely minimal freestanding RV32GC C/C++ environment. You won't have a heap implementation, so no new/delete. And you can't printf values because you don't have a C standard library, so you can only write strings and buffers using the write system call. Still, the stripped binary is only 784 bytes, and will execute only ~120 instructions running the whole program! The micro project actually initializes zero-initialized memory, calls global constructors and passes program arguments to main.

Instruction set support

The emulator currently supports RV32GC (IMAFDC), and RV64G (IMAFD). The F and D-extensions should be 100% supported (32- and 64-bit floating point instructions), and there is a test-suite for these instructions, however they haven't been extensively tested as there are generally few FP-instructions in normal programs.

Note: There is no support for the B-, E- and Q-extensions.

Usage

Load a binary and let the machine simulate from _start (ELF entry-point):

#include <libriscv/machine.hpp>

template <int W>
long syscall_exit(riscv::Machine<W>& machine)
{
	printf(">>> Program exited, exit code = %d\n", machine.template sysarg<int> (0));
	machine.stop();
	return 0;
}

int main(int /*argc*/, const char** /*argv*/)
{
	const auto binary = <load your RISC-V ELF binary here>;

	riscv::Machine<riscv::RISCV32> machine { binary };
	// install a system call handler
	machine.install_syscall_handler(93, syscall_exit<riscv::RISCV32>);

	// add program arguments on the stack
	std::vector<std::string> args = {
		"hello_world", "test!"
	};
	machine.setup_argv(args);

	// this function will run until the exit syscall has stopped the machine
	// or an exception happens which stops execution
	machine.simulate();
}

You can limit the amount of (virtual) memory the machine can use like so:

	const uint32_t max_memory = 1024 * 1024 * 64;
	riscv::Machine<riscv::RISCV32> machine { binary, max_memory };

You can limit the amount of instructions to simulate at a time like so:

	const uint64_t max_instructions = 1000;
	machine.simulate(max_instructions);

Similarly, when making a function call into the VM you can also add this limit as a template parameter to the vmcall() function.

You can find details on the Linux system call ABI online as well as in the syscalls.hpp, and syscalls.cpp files in the src folder. You can use these examples to handle system calls in your RISC-V programs. The system calls is emulate normal Linux system calls, and is compatible with a normal Linux RISC-V compiler.

Setting up your own machine environment

You can create a 64kb machine without a binary, and no ELF loader will be invoked.

	const uint32_t max_memory = 65536;
	std::vector<uint8_t> nothing; // taken as reference
	riscv::Machine<riscv::RISCV32> machine { nothing, max_memory };

Now you can copy your machine code directly into memory:

	std::vector<uint8_t> my_program_data;
	const uint32_t dst = 0x1000;
	machine.copy_to_guest(dst, my_program_data.data(), my_program_data.size());

Finally, let's jump to the program entry, and start execution:

	// example PC start address
	const uint32_t entry_point = 0x1068;
	machine.cpu.jump(entry_point);

	// geronimo!
	machine.simulate(5'000);

Documentation

System calls

Freestanding environment

Function calls into the VM

Why a RISC-V library

It's a drop-in sandbox. Perhaps you want someone to be able to execute C/C++ code on a website, safely?

See the webapi folder for an example web-server that compiles and runs limited C/C++ code in a relatively safe manner. Ping me or create a PR if you notice something is exploitable.

Note that the web API demo uses a docker container to build RISC-V binaries, for security reasons. You can build the container with docker build -t newlib-rv32gc . -f newlib.Dockerfile from the docker folder. Alternatively, you could build a more full-fledged Linux environment using docker build -t linux-rv32gc . -f linux.Dockerfile. There is a test-script to see that it works called dbuild.sh which takes an input code file and output binary as parameters.

It can also be used as a script backend for a game engine, as it's quite a bit faster than LuaJIT, although it requires you to compile the scripts ahead of time as binaries using any computer language which can output RISC-V.

What to use for performance

Use Clang (newer is better) to compile the emulator with. It is somewhere between 20-25% faster on most everything.

Use GCC to build the RISC-V binaries. Use -O2 or -O3 and use the regular standard extensions: -march=rv32gc -mabi=ilp32d. Enable the RISCV_EXPERIMENTAL option for the best performance unless you are using libriscv as a sandbox. Use -march=rv32g for the absolute best performance, if you have that choice. Difference is minimal so don't go out of your way to build everything yourself. Try enabling the instruction decoder cache and see if it's faster for your case. Always enable the page cache. Always enable LTO. Fair warning: It's a bit harder to use Clang for freestanding RISC-V. 32-bit is a little bit faster than 64-bit, although you should probably just measure that.

Otherwise, if you are building the libc yourself, you can outsource all the heap functionality to the host using specialized system calls. See emulator/syscalls/src/native_heap.hpp, as well as the native_libc files. This will manage the location of heap chunks outside of the emulator, however the heap memory itself is still inside the virtual memory of the guest binary. There is also an accelerated tiny threads implementation, see: microthread.hpp and emulator/syscalls/src/native_threads.cpp. Check out the rvscript repository on github for an actual implementation of this.

If you have arenas available you can replace the default page fault handler with your that allocates faster than regular heap. If you intend to use many (read hundreds, thousands) of machines in parallell you absolutely must use the forking constructor option, which applies copy-on-write to all pages on the newly created machine. Also, enable RISCV_EXPERIMENTAL so that both the decoder cache and execute page data is shared. Don't run any untrusted executables unless you audit the RISCV_EXPERIMENTAL feature.

libriscv's People

Contributors

fwsgonzo avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.