GithubHelp home page GithubHelp logo

zhubingbing / trtis-k8s-scheduler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from seldonio/trtis-k8s-scheduler

0.0 1.0 0.0 99 KB

Custom Scheduler to deploy ML models to TRTIS for GPU Sharing

License: Apache License 2.0

Makefile 14.60% Go 85.40%

trtis-k8s-scheduler's Introduction

TRTIS K8S Scheduler

Proof of concept to schedule ML models onto NVIDIA TensorRT Inference Servers running as Kubernetes DaemonSets.

Motivation

Provide the ability for ML models to share GPUs to save on infrastructure costs.

GPU Sharing Goals

GPU sharing has several sub requirements.

  • Scheduling
    • Decide which GPU to attach a model
  • Isolation
    • Ensure multi-tenant users can not interfere with each other.
  • Fairness
    • Ensure each user has fair access to GPU resources.

We will mostly be concerned with scheduling.

Existing Resources

Proposal

Follow the work of Alibaba to provide a custom scheduler but rather than use a low level NVIDIA device plugin utilize TRTIS servers to run models.

Components

  • TRTIS Daemonset
    • TRTIS running on each GPU node as a k8s Daemonset using an NFS back model repository.
  • Scheduler
    • Custom scheduler that looks for pods assigned to decide which node to place them.
  • Monitor
    • Runs alongside TRTIS server to expose GPU metrics onto node as annotations
  • Loader (initContainer)
    • Loads model onto TRTIS model repository for a node.
  • Proxy/Unloader
    • Optional proxy that forwards API requests to TRTIS server on node
    • Unloads model from server when terminated

ML Pod Scheduling Requirements

To be scheduled a pod must:

  • Have a custom resource limit seldon.io/trtis-gpu-mem
    • This will specify the GPU memory required
  • Have a annotation for the model ID: seldon.io/trtis-model-id
    • This will ensure a model is not scheduled more than once on any node
  • Have custom schedulerName set: schedulerName: trtis-scheduler

In this demo the pod will be defined via a Deployment with the following containers

  • A 1st initContainer gcr.io/kfserving/storage-initializer:0.2.1 to download model from cloud storage to local
  • A 2nd initContainer seldonio/trtis-loader:0.1 to load model onto TRTIS model repo and wait for TRTIS to show its loaded
  • A container seldonio/trtis-proxy:0.1
    • Acts as an optional proxy for REST and GRPC requests to server as well as possible isolation enforcer to only allow requests to loaded model on server.
    • Unloads model on termination

Scheduling Steps

  1. A pod with appropriate settings as discussed above is created. This could be done via an operator using a CRD for model definition, e.g. KFServing or Seldon.
  2. The TRTIS-Scheduler will currently:
    • For each node
      • Calculate the total memory for pods assigned to that node from their seldon.io/trtis-gpu-mem
      • Get the total available memory on node via node annotation seldon.io/trtis-gpu-mem-total
      • Check the running model IDs via the pod annotations seldon.io/trtis-model-id
    • A pod can be scheduled if there is enough memory and same model ID is not already on node
    • Choose a random node for available nodes to schedule pod and bind the pod to that node.
    • If no node satisfies the constraints the pod is placed back in the scheduling queue with an exponential backoff (max 2 mins). It will remain “Pending” in status field until scheduled.
  3. When the pod starts on the node it will
    • Download model from cloud storage
    • Upload model to TRTIS model repository on that node
      • Optionally in future update an “Endpoint” to add this Node to a Service for this model.
    • Wait for TRTIS server to say its loaded. (TRTIS server running in POLL mode)
    • The main container is a proxy to forward REST and GRPC requests
  4. On termination the pod deletes its folder from the TRTIS model repository.
    • Optionally in future remove this TRTIS node from the “Endpoint” for the Service for this model.

API Requests

There are two options:

  1. Use the proxy service running on nodes.
    • Advantages
      • Allows for isolation of requests to ensure other models on the TRTIS server can not be called.
      • Easy to create a service which automatically load balances as more replicas of the model are created (manually or via auto-scaling).
    • Disadvantages
      • Adds an extra network hop and proxy step.
  2. Use custom Service and Endpoint.
    • Advantages
      • No proxy step
    • Disadvantages
      • No immediate model isolation
      • Needs custom controller to update endpoint as models are added to TRTIS nodes or removed. A standard Service is not possible as all TRTIS DaemonSets will have same labels.

Demo

trtis-k8s-scheduler's People

Contributors

ukclivecox avatar gaocegege avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.