GithubHelp home page GithubHelp logo

williamtran29 / vdp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from instill-ai/instill-core

0.0 1.0 0.0 8.17 MB

πŸ’§ Instill VDP (Versatile Data Pipeline) is an open-source tool to seamlessly integrate AI to process unstructured data in the modern data stack

Home Page: https://www.instill.tech

License: Other

JavaScript 56.97% Makefile 30.87% Smarty 8.43% Dockerfile 3.73%

vdp's Introduction

Versatile Data Pipeline: unstructured data ETL


Instill VDP Β  Twitter URL

GitHub release (latest SemVer including pre-releases) Artifact Hub Discord Integration Test Documentation deployment workflow License MIT License ELv2

Versatile DataΒ Pipeline (VDP) is a source available unstructured data ETL tool to streamline the end-to-end unstructured data processing pipeline:

  • Extract unstructured data from pre-built data sources such as cloud/on-prem storage, or IoT devices

  • Transform it into analysable or meaningful data representations by AI models

  • Load the transformed data into warehouses, applications, or other destinations

VDP Concept

Highlights

Demo playground

An online demo VDP instance has been provisioned, in which you can directly play around the basic features in its Console via https://demo.instill.tech.

Want to showcase your ML/DL models? We offer fully-managed VDP on Instill Cloud. Please sign up the form and we will reach out to you.

Prerequisites

  • macOS or Linux - VDP works on macOS or Linux, but does not support Windows yet.

  • Docker and Docker Compose - VDP uses Docker Compose (specifically, Compose V2 and Compose specification) to run all services at local. Please install the latest stable Docker and Docker Compose before using VDP.

  • yq > v4.x. Please follow the installation guide.

  • (Optional) NVIDIA Container Toolkit - To enable GPU support in VDP, please refer to NVIDIA Cloud Native Documentation to install NVIDIA Container Toolkit. If you'd like to specifically allot GPUs to VDP, you can set the environment variable NVIDIA_VISIBLE_DEVICES. For example, NVIDIA_VISIBLE_DEVICES=0,1 will make the triton-server consume GPU device id 0 and 1 specifically. By default NVIDIA_VISIBLE_DEVICES is set to all to use all available GPUs on the machine.

Quick start

Execute the following commands to start pre-built images with all the dependencies:

$ git clone https://github.com/instill-ai/vdp.git && cd vdp

# Launch all services
$ make all

πŸš€ That's it! Once all the services are up with health status, the UI is ready to go at http://localhost:3000!

VDO Console

Jump right in VDP 101: Create your first pipeline on VDP and explore other VDP tutorials.

Note

The image of model-backend (~2GB) and Triton Inference Server (~23GB) can take a while to pull, but this should be an one-time effort at the first setup.

Shut down VDP

To shut down all running services:

$ make down

Guidance philosophy

VDP is built with open heart and we expect VDP to be exposed to more MLOps integrations. It is implemented with microservice and API-first design principle. Instead of building all components from scratch, we've decided to adopt sophisticated open-source tools:

We hope VDP can also enrich the open-source communities in a way to bring more practical use cases in unstructured data processing.

Documentation

πŸ“” Documentation

Check out the documentation & tutorials to learn VDP!

πŸ“˜ API Reference

The gRPC protocols in protobufs provide the single source of truth for the VDP APIs. The genuine protobuf documentation can be found in our Buf Scheme Registry (BSR).

For the OpenAPI documentation, access http://localhost:3001 after make all, or simply run make doc.

Model Hub

We curate a list of ready-to-use models for VDP. These models are from different sources and have been tested by our team. Want to contribute a new model? Please create an issue, we are happy to test and add it to the list πŸ‘.

Model Task Sources Framework CPU GPU
MobileNet v2 Image Classification GitHub-DVC ONNX βœ… βœ…
Vision Transformer (ViT) Image Classification Hugging Face ONNX βœ… ❌
YOLOv4 Object Detection GitHub-DVC ONNX βœ… βœ…
YOLOv7 Object Detection GitHub-DVC ONNX βœ… βœ…
YOLOv7 W6 Pose Keypoint Detection GitHub-DVC ONNX βœ… βœ…
PSNet + EasyOCR Optical Character Recognition (OCR) GitHub-DVC ONNX βœ… βœ…
Mask RCNN Instance Segmentation GitHub-DVC PyTorch βœ… βœ…
Lite R-ASPP based on MobileNetV3 Semantic Segmentation GitHub-DVC ONNX βœ… βœ…
Stable Diffusion Text to Image GitHub-DVC, Local-CPU, Local-GPU ONNX βœ… βœ…
Megatron GPT2 Text Generation GitHub-DVC FasterTransformer ❌ βœ…

Note: The GitHub-DVC source in the table means importing a model into VDP from a GitHub repository that uses DVC to manage large files.

Community support

For general help using VDP, you can use one of these channels:

  • GitHub - bug reports, feature requests, project discussions and contributions

  • Discord - live discussion with the community and our team

  • Newsletter & Twitter - get the latest updates

If you are interested in hosting service of VDP, we've started signing up users to our private alpha. Get early access and we'll contact you when we're ready.

Contributing

We love contribution to VDP in any forms:

Note Code in the main branch tracks under-development progress towards the next release and may not work as expected. If you are looking for a stable alpha version, please use latest release.

License

See the LICENSE file for licensing information.

We're hiring πŸš€

Interested in building VDP with us? Join our remote team and build the future for unstructured data ETL. Check out our open roles.

vdp's People

Contributors

pinglin avatar xiaofei-du avatar droplet-bot avatar donch1989 avatar phelan164 avatar heiruwu avatar eiffelfly avatar dependabot[bot] avatar sarthak-instill avatar denizparlak avatar bryan107 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.