GithubHelp home page GithubHelp logo

nexus's Introduction

Nexus

Docker Image

Nexus is a scalable and efficient serving system for DNN applications on GPU cluster.

SOSP 2019 Paper

  • Check out our SOSP 2019 paper here.
  • Check out the Google Drive that contains a sample of video dataset.

Building Nexus

See BUILDING.md for details.

Docker and Examples

We provide a Docker image so that you can try Nexus quickly. And there is an example that goes step by step on how to run Nexus with a simple example application. We recommend you to take a look here.

Deployment

Download Model Zoo

Nexus publishes public model zoo on our department-hosted GitLab. To download, you need to install Git LFS first. Then, run:

git clone https://gitlab.cs.washington.edu/syslab/nexus-models
cd nexus-models
git lfs checkout

Run the Profiler

Nexus is a profile-based system. So before running Nexus, make sure you have profiled all the GPUs. To profile a certain model on a certain GPU, run:

nexus/tools/profiler/profiler.py --gpu_list=GPU_INDEX --gpu_uuid \
    --framework=tensorflow --model=MODEL_NAME \
    --model_root=nexus-models/ --dataset=/path/to/datasets/

The profile will be saved to the --model_root directory. See examples for more concrete usage.

Run Nexus

To run Nexus, you need to run the scheduler first, then spawn a backend for each GPU card, and finally run the Nexus frontend of your application. See examples for more concrete usage.

nexus's People

Contributors

abcdabcd987 avatar crystalrem avatar icemelon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

nexus's Issues

Issues with Building Apps and Docker Images

Hello,
I am interested in trying out Nexus. After following your Installation instructions, I was successfully able to build Nexus, but have been unable to (1) run the Local Tests without Docker, or (2) Build Docker Image. For (1), I am unable to build the simple_app (or any app) because libnexus is not being built according to the installation instructions, and so far, I have been unable to successfully build it through modifications to the root's CMakeLists.txt or the app's Makefile. For (2), the base image builds, but the other three do not for various reasons: NexusBackendDockerfile fails on making backend and tools, seemingly because of caffe2 (which I attempted to disable by having it not compile that backends, but this was unsuccessful), while NexusSchedulerDockerfile and NexusAppLibDockerfile seem to have variable scope and type errors.

Any guidance for how to get it working would be greatly appreciated. Thank you in advance!

std out of range error

Hi, I come across this error here today. I pulled the docker image and tried to profile the model, but it gave me this error. I wonder what's the problem? I am running the model on an A100 gpu, is it not supported by Nexus? Thanks so much if anyone could help me with this issue!

image

Scheduler cannot find forward latency

Hi, I followed the latest example to run Nexus on docker. However when I start the frontend, the scheduler collapsed and says "F0322 14:16:14.132954 14 model_db.cpp:96] Cannot find forward latency: model=tensorflow:resnet_0:1:224x224 batch=1". Detail in the images below.

image

image

image

Please help me with this, thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.