GithubHelp home page GithubHelp logo

Comments (3)

glenn-jocher avatar glenn-jocher commented on July 21, 2024

@abelBEDOYA hello,

Thank you for providing detailed information and screenshots regarding your batch inference issue. To help us investigate further, could you please share a minimal reproducible code example? This will allow us to replicate the issue on our end. You can refer to our guide on creating a minimal reproducible example here: Minimum Reproducible Example.

Additionally, please ensure that you are using the latest versions of torch and ultralytics. If not, kindly update your packages and try running your tests again to see if the issue persists.

Regarding your observations, it's important to note that while batch inference can offer speed improvements, the actual performance gain depends on various factors such as GPU memory bandwidth, the complexity of the model, and the overhead of batching operations. The linear increase in time you are observing might be due to these factors.

Here's a quick example of how you can perform batch inference using the ultralytics library:

from ultralytics import YOLO
import torch
import time

# Load the YOLOv8 model
model = YOLO("yolov8n.pt")

# Prepare a batch of images
images = [torch.randn(3, 640, 640) for _ in range(8)]  # Example batch of 8 images

# Measure inference time for batch processing
start_time = time.time()
results = model.predict(images)
end_time = time.time()

print(f"Batch inference time: {end_time - start_time} seconds")

Feel free to adjust the batch size and image dimensions as per your requirements. If you continue to experience issues, please share the code you are using for both the loop and batch inference tests.

Looking forward to your response!

from ultralytics.

abelBEDOYA avatar abelBEDOYA commented on July 21, 2024

Hi, I've been testing your point and there is no difference between batch inference and a simple loop through list of images when it comes to spent time. These are the measures,
Screenshot from 2024-06-21 10-18-18

This plot is the outcome of this script (minimal reproducible code example)

from ultralytics import YOLO
import torch
import time
import numpy as np
import matplotlib.pyplot as plt

# Load the YOLOv8 model
model = YOLO("yolov8m.pt")
nn = [2,3,4,5,6,7,8,9,11,13,16,19,24,29,34,40,47,56]
n_samples = 5
tt_batch = []
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))
tt_loop = []

## LOOPING INFERENCE:
for n in nn:
    t_ = []
    for _ in range(n_samples):
        images = [torch.sigmoid(torch.randn(1, 3, 640, 640)) for _ in range(n)] 
        start_time = time.time()
        for img in images:
            results = model.predict(img, verbose=False)
        end_time = time.time()
        t_.append(end_time-start_time)

    t_m = np.mean(t_)
    tt_loop.append(t_m)
    print(n,': ', t_m)

model = YOLO("yolov8m.pt")
r = model.predict(torch.sigmoid(torch.randn(1, 3, 640, 640)))


## BATCH INFERENCE:
for n in nn:
    t_ = []
    parar = False
    for _ in range(n_samples):
        images = torch.randn(n, 3, 640, 640) #[torch.randn(n, 3, 640, 640) for _ in range(n)]  # Example batch of 8 images
        images = torch.sigmoid(images)
        start_time = time.time()
        try:
            results = model.predict(images, verbose=False)
        except:
            parar = True
            break
        end_time = time.time()
        t_.append(end_time-start_time)
    if parar:
        break
    t_m = np.mean(t_)
    tt_batch.append(t_m)
    print(n,': ', t_m)




plt.plot(nn, tt_loop, label='looping', color = 'r')
plt.plot(nn, tt_loop, 'o', color = 'r')
plt.plot(nn[:len(tt_batch)], tt_batch, label='batch_inference', color = 'blue')
plt.plot(nn[:len(tt_batch)], tt_batch, 'o', color = 'blue')
plt.legend(loc='best', frameon=True)
plt.xlabel('number of images')
plt.ylabel('procesing time')
plt.show()

from ultralytics.

glenn-jocher avatar glenn-jocher commented on July 21, 2024

Hi @abelBEDOYA,

Thank you for providing a detailed minimal reproducible example and the results of your tests. It's great to see such thorough investigation! 😊

From your observations, it appears that the batch inference time scales linearly with the number of images, similar to looping through individual images. This behavior can be influenced by several factors, including GPU memory bandwidth, the overhead of batching operations, and the specific implementation details of the model and inference engine.

Here are a few points to consider:

  1. Batch Size and GPU Utilization: The efficiency of batch processing can vary depending on the batch size and the GPU's ability to handle multiple images simultaneously. Smaller batch sizes might not fully utilize the GPU, while larger batch sizes could lead to memory bottlenecks.

  2. Model Complexity: The complexity of the YOLOv8 model can also impact the performance gains from batching. More complex models might not see as significant speedups from batching due to the overhead of managing larger tensors.

  3. Inference Engine: The underlying inference engine (PyTorch in this case) might have optimizations that affect how batch processing is handled compared to individual image processing.

To further investigate, you might want to try the following:

  • Experiment with Different Batch Sizes: Test with varying batch sizes to see if there's an optimal size that provides better performance.
  • Profile GPU Utilization: Use tools like NVIDIA's nvidia-smi to monitor GPU utilization and memory usage during batch and looped inference to identify any bottlenecks.
  • TensorRT Export: Consider exporting your model to TensorRT for potentially better batch inference performance. TensorRT optimizes models for NVIDIA GPUs and can provide significant speedups. You can find more details on exporting to TensorRT here.

Here's a quick example of how to export to TensorRT and run inference:

from ultralytics import YOLO

# Load the YOLOv8 model
model = YOLO("yolov8m.pt")

# Export the model to TensorRT format
model.export(format="engine")  # creates 'yolov8m.engine'

# Load the exported TensorRT model
tensorrt_model = YOLO("yolov8m.engine")

# Run batch inference
images = torch.randn(8, 3, 640, 640)  # Example batch of 8 images
results = tensorrt_model.predict(images)

I hope this helps! If you have any further questions or need additional assistance, feel free to ask. We're here to help! 😊

from ultralytics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.