GithubHelp home page GithubHelp logo

wzhao18 / cricket Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rwth-acs/cricket

0.0 0.0 0.0 4.78 MB

cricket is a virtualization solution for GPUs

License: MIT License

Shell 0.14% C++ 9.28% C 88.28% Cuda 0.67% Makefile 0.71% Dockerfile 0.09% RPC 0.82%

cricket's Introduction

cricket

pipeline status

Cricket consists of two parts: A virtualization layer for CUDA applications that allows the isolation of CPU and CPU parts by using Remote Procedure Calls and a checkpoint/restart tool for GPU kernels.

Dependencies

Cricket requires

  • CUDA Toolkit (E.g. CUDA 11.1)
  • rpcbind
  • libcrypto
  • libgdb (only for in-kernel C/R)
  • libtirpc

Libgdb and libtirpc built as part of the main Makefile.

On the system where the Cricket server should be executed, the appropriate NVIDIA drivers should be installed.

Building

git clone https://github.com/RWTH-ACS/cricket.git
cd cricket && git submodule update --init
LOG=INFO make

Environment variables for Makefile:

  • LOG: Log level. Can be one of DEBUG, INFO, WARNING, ERROR.
  • WITH_IB: If set to YES build with Infiniband support.
  • WITH_DEBUG: Use gcc debug flags for compilation

Running a CUDA Application

By default Cricket uses TCP/IP as a transport for the Remote Procedure Calls. This enables both remote execution, where server and client execute on different systems and local execution, where server and client execute on the same system. To support Cricket, the CUDA libraries must be linked dynamically to the CUDA application. For the runtime library, this can be done using the '-cudart shared' flag of nvcc.

The Cricket library has to be preloaded to the CUDA Application. For starting the server:

LD_PRELOAD=<path-to-cricket>/bin/cricket-server.so <cuda-binary>

The client can be started like this:

REMOTE_GPU_ADDRESS=<address-of-server> LD_PRELOAD=<path-to-cricket>/bin/cricket-client.so <cuda-binary>

Example: Running a test application locally

LD_PRELOAD=/opt/cricket/bin/cricket-server.so /opt/cricket/tests/test_kernel
REMOTE_GPU_ADDRESS=127.0.0.1 LD_PRELOAD=/opt/cricket/bin/cricket-client.so /opt/cricket/tests/test_kernel

Example: Running the nbody CUDA sample using Cricket on a remote system

Compile the application

cd /nfs_share/cuda/samples/5_Simulations/nbody
make NVCCFLAGS="-m64 -cudart shared" GENCODE_FLAGS="-arch=sm_61"

Start the Cricket server

LD_PRELOAD=/nfs_share/cricket/bin/cricket-server.so /nfs_share/cuda/samples/5_Simulations/nbody/nbody

Run the application

REMOTE_GPU_ADDRESS=remoteSystem.my-domain.com LD_PRELOAD=/nfs_share/cricket/bin/cricket-client.so /nfs_share/cuda/samples/5_Simulations/nbody/nbody -benchmark

Contributing

File structue

  • cpu: The virtualization layer
  • gpu: The checkpoint/restart tool
  • submodules: Submodules are located here.
    • cuda-gdb: modified GDB for use with CUDA. We mostly need the modified libbfd for gathering information from the CUDA ELF.
    • libtirpc: Transport Indepentend Remote Procedure Calls is requried for the virtualization layer-
  • tests: some synthetic CUDA applications to test cricket.
  • utils: A Dockerfile for repoducibility and for our CI.

Style Guidelines:

set cindent
set tabstop=4
set shiftwidth=4
set expandtab
set cinoptions=(0,:0,l1,t0,L3
match ErrorMsg /\s\+$\| \+\ze\t/

This project adheres to the Linux Kernel Coding Style, except when it doesn't.

Etymology: Cricket is an abbreviation for Checkpoint Restart In Cuda KErnels Tool

cricket's People

Contributors

fuentesgrau avatar jounathaen avatar n-eiling avatar philipp-fensch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.