GithubHelp home page GithubHelp logo

semoss / remote-client-server Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 60 KB

FastAPI application that supports TCP and HTTP clients used to serve different gen-AI models that require a GPU for inference.

Dockerfile 2.36% Python 94.53% Jupyter Notebook 3.11%

remote-client-server's Introduction

Remote Client Server

This is a FastAPI application that supports TCP and HTTP clients. The server can be run locally or in a Docker container. This server is used to serve different gen-AI models that require a GPU for inference.

This server uses a queue to manage GPU consumption. The queue accepts the websocket connection and processes the request in the order it was received. The server can be scaled horizontally by running multiple instances of the server.

This is currently setup to only run a single type of model at a time. Please see the Adding New Models section for more information on how to add new models.

Current Supported Models

The following models are currently supported. Use the MODEL environment variable to specify which model to load by including the value of the key value pair of the supported models below.

  • Image Generation
    • MODEL: PixArt-alpha/PixArt-XL-2-1024-MS -- SHORTNAME : pixart

Local Installation (Assumes Windows w/ Anaconda)

Running PyTorch with CUDA on Windows can be a bit tricky and the steps may vary based on your system configuration. The following steps should help you get started.

  • conda activate base
  • conda create --name your_environment_name python=3.11
  • conda activate your_environment_name
  • conda install cuda --channel nvidia/label/cuda-12.4.0
  • pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
  • conda env update -f environment.yml
  • pip install -r requirements.txt

PyTorch/CUDA

  • You can test your local PyTorch/CUDA installation by using the utils/torch_test.ipynb notebook.

Downloading Model Files

  • The model files are too large to store on github and are downloaded at the start up of the Docker container.
  • Additionally, the model files are too large to store more than one model at a time in a Docker container. Refer to download.py for logic on how the model files are checked and downloaded on server start up.
  • When developing locally, you can download the model files using the utils/dl_model_files.py script or just have the start up lifecycle do it for you (This will remove any existing files in model_files).

Running the Server Locally

  • You can run the server locally using the server/main.py script.
python server/main.py
  • You can specify the host and port using the --host and --port flags.
python main.py --host "127.0.0.1" --port 5000 

Port

  • Server runs on port localhost:8888 unless otherwise specified.

Docker

docker build -t remote-client-server .

If you run the container without a volume attached, make sure the model files are downloaded in the model_files directory.

docker run -p 8888:8888 -e MODEL=pixart -e HOST=0.0.0.0 -e PORT=8888 --gpus all --name remote-client-server remote-client-server

Run the container with a volume attached with the model files.

docker run --rm -p 8888:8888 -e MODEL=pixart -e HOST=0.0.0.0 -e PORT=8888 --gpus all --name remote-client-server -v pixart-volume:/app/model_files remote-client-server

Docker Volumes

  • You can use Docker volumes to store the model files.
  • The volume should contain root directories for each model which should be named by the model short name.
  • The volume is attached to the container at /app/model_files.

Access API Documentation

  • http://127.0.0.1:8888/docs for Swagger UI documentation.
  • http://127.0.0.1:8888/redoc for ReDoc documentation.

Access API Endpoints

  • ws://localhost:8888/api/generate - Gen-AI WebSocket endpoint.

  • http://localhost:8888/api/health - Health check endpoint.

  • http://localhost:8888/api/status - Returns an object with values for the current model, queue size, GPU utilization and server status.

  • http://localhost:8888/api/models/{model} - Takes a model short name as a parameter and returns whether the correct model files are present in the model_files directory.

Adding New Models

  • Add a new file and class to the app/gaas directory to support the new model.
  • In model_utils/model_config.py, add the model config to the SUPPORTED_MODELS object.
  • The expected_model_class can be found in the model_index.json of the downloaded model files.
  • If you are adding a new TYPE of model, update the model_switch() method in the QueueManager class to support the new model.
  • You can enforce type checking with pydantic by adding a new class to the server/pydantic_models directory.

Formatting

  • This project uses the Black code formatter. Please install the Black formatter in your IDE to ensure consistent code formatting before submitting a PR.

TO DO:

  • Update ImageGen class to use generic pipeline and abstract class for different image generation models.
  • Update the generation route for dynamically type checking the request for different models.
  • Add semaphore and Docker env for setting the number of conncurrent operations utilzing GPU (currently set to 1).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.