eora-ai / inferoxy Goto Github PK

View Code? Open in Web Editor NEW

94.0 5.0 3.0 9.54 MB

Service for quick deploying and using dockerized Computer Vision models

License: GNU General Public License v3.0

Dockerfile 0.57% Makefile 0.60% Python 98.02% Shell 0.80%

mlops python computer-vision machine-learning pipelines

inferoxy's Issues

API Documentation

There is no API documentation at this time.

Model should have ability to run infinitely

There are currently up and down triggers for models. But if Inferoxy is used in the scenario of hosting a signle model it isn't needed to release this model - it should run without interruptions in order to get rid of the "cold start" effect. Otherwise, we wait 10 seconds for a model to start even if a very simple model is used.

Multiple images at input

There's a need to send multiple images per request item, for example, when we are running a model for image retrieval task which accepts images of a product from different angles of view. The amount of images of the product is arbitrary. So, it looks like we can make a request object with a list of tensors of different sizes

Speed of request processing.

I have a problem with very slow requests processing.

Ten parallel (or sequential) requests with only one image as input are processing for ±8 minutes, while one single (first) request takes ±10 seconds to complete.

In the logs it seems like inferoxy waits for something before process the image.

There is a log:
log.log

The image:

The model can be dowloaded from docker hub:
docker pull smthngslv/clip_vit-b32_no_proj:latest

Code, image, log.
Archive.zip

Also, does the setting exists , that allow to do not shutdown the container between requests (I mean, run sequential requests on the same container)?

Create Helm chart for Kubernetes deployment

It would be helpful to have a Helm chart that automatically performs steps described in this tutorial.

Drop frames in video processing

Expected behavior

Pass a video to Inferoxy processing. It processed using some model and returns in the same way that was sent.

Examples:
Input: https://api.dev.visionhub.ru/public_media/66218c4f-7b14-45b0-8669-370dee03ffa6
output: https://api.dev.visionhub.ru/public_media/1eb3f365-00d7-4b66-a134-0d3b952eb89c

Current behavior

There is frame dropping.

We can not guarantee an order, because of retriable errors. For example, we have three batches that are passed to Inferoxy in order.

3 -> 2 -> 1 -> inferoxy

The first batch processed. When the second batch start processing model failed because processing instance is disappeared.
We put the second batch at the end of the queue.

2 -> 3 -> 1 -> inferoxy

Possible solution

Again we don't guarantee order, but users can send an index of the input in parameters and sort output on their side(

Steps to reproduce

Read video in frames
Send frames in order to Inferoxy
Receive results from Inferoxy
Build a video
Compare input and output video and see frame dropping and unordering

Support of multiple models on a single GPU

Let's imagine the case when we have a stateful model consisting 2 Gb of GPU memory while processing an RTSP stream. If we have a GPU with 16 Gb of memory then theoretically we can run up to 8 copies of such models. This would be currently possible with stateless models but usually RTSP streams are processed by people trackers or similar which are stateful. The issue was first mentioned by @tz3 who designes the system where 8 RSTP streams are processed in parallel.

Profile REST API latency

Even a simple REST API request to CPU model has been slowly processed (10 seconds). The reason is probably in latency since computation overhead is minimal. Needed to profile netwok during simple requests

Re-usage of a stateful model.

In the documentation:

Down triggers (↓):
time of last use for source_id > T_max - in this case either model release or instance stopping happens depending on whether there are incoming requests to this model

But now in code:

            if (
                time.time() - model_instance.sender.get_time_of_last_sent_batch()
                > self.config.load_analyzer.stateful_checker.keep_model
            ):
                triggers += [self.make_decrease_trigger(model_instance=model_instance)]

There is no check for incoming request, just deletion procedure.

Add automatic release action.

Add github action that will automatically create release.

The following links may be useful:
https://github.com/thomaseizinger/github-action-gitflow-release-workflow
https://blog.eizinger.io/12274/using-github-actions-and-gitflow-to-automate-your-release-process

eora-ai / inferoxy Goto Github PK

inferoxy's Issues

API Documentation

Model should have ability to run infinitely

Multiple images at input

Speed of request processing.

Create Helm chart for Kubernetes deployment

Drop frames in video processing

Expected behavior

Current behavior

Possible solution

Steps to reproduce

Support of multiple models on a single GPU

Profile REST API latency

Re-usage of a stateful model.

Add automatic release action.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs