<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Export YOLOv8 by using TensorRT with INT8 calibration about ultralytics HOT 8 OPEN

pornpra commented on June 16, 2024

Export YOLOv8 by using TensorRT with INT8 calibration

from ultralytics.

Comments (8)

glenn-jocher commented on June 16, 2024

Hi there! Here's how you can handle exporting your YOLOv8 model to TensorRT with INT8 calibration:

Calibration Dataset Size: Generally, for INT8 calibration, using at least 1000 images from your dataset is advised to minimize any significant drop in accuracy. More would be better to represent the dataset's variability effectively.
Custom Calibration Dataset: You can specify your custom dataset during the export process by modifying the data argument in your code. Make sure this points to your dataset configuration file, similar to how coco.yaml is pointed out here:

from ultralytics import YOLO

model = YOLO("path/to/your/custom_yolov8.pt") # ensure this points to your trained model
model.export(
    format="engine",
    dynamic=True,
    batch=8,
    workspace=4,
    int8=True,
    data="path/to/your/custom_dataset.yaml", # specify your dataset configuration here
)

model = YOLO("yolov8n.engine", task="detect") # load the TensorRT model

Just replace "path/to/your/custom_dataset.yaml" with the path to your dataset configuration file. This customization will enable the use of your intended dataset during the calibration process.

Happy exporting! 😊

from ultralytics.

pornpra commented on June 16, 2024

@glenn-jocher Thanks for your response. I have two more questions about INT8 calibration:

Can you share the file coco.yaml, or suggest where I can download it? I need to see the details inside this file to build my custom_dataset.yaml. Currently, I have many images (all in .jpg format) for the calibration process in my directory, but I'm unsure how to use them to construct custom_dataset.yaml.
Can I use the latest Ultralytics version for exporting to INT8 and then use it (engine format) in an older version? Is it supported?

from ultralytics.

glenn-jocher commented on June 16, 2024

Hi! Great to see your interest in diving deeper into INT8 calibration with YOLOv8! 😊

coco.yaml file: You can find the coco.yaml file in the Ultralytics GitHub repository under the data directory. Here is how a typical coco.yaml looks like, customize it according to your dataset structure:

train: ../coco/train2017  # 118k images
val: ../coco/val2017  # 5k images
nc: 80
names: [ 'person', 'bicycle', ...]  # Class names

For your custom_dataset.yaml, ensure you specify the paths to your training and validation data accordingly, set nc (number of classes), and define your class names.

Version Compatibility: Generally, it's advisable to use the same version of Ultralytics for both exporting and deploying your models. While backward compatibility might exist in some scenarios, using different versions can sometimes lead to unpredicted issues due to differences in implementations and dependencies.

I hope this helps! Let us know if there's anything else you need. 🌟

from ultralytics.

pornpra commented on June 16, 2024

@glenn-jocher

Can I use only the validation set for calibration, or do I need to use both the training set and a calibration set?

from ultralytics.

glenn-jocher commented on June 16, 2024

Hi! For INT8 calibration, using just the validation set can be sufficient, especially if it's representative of the kind of data the model will encounter in deployment. However, including diverse images from the training set can further enhance the robustness of your calibrated model. Just ensure that the chosen images are varied enough to cover different scenarios. Happy calibrating! 😊

from ultralytics.

pornpra commented on June 16, 2024

@glenn-jocher
I have successfully converted YOLOv8s from FP32 to INT8 without any errors, thanks for your support. Next, I use the following Python code to test the performance of INT8 in terms of inference time and RAM usage, comparing it with YOLOv8s in FP32 (.pt) and FP16 (.engine) formats, using a single image.

My environment

NVIDIA Jetson Orin Nano (8GB)
Python 3.8.10
Ultralytics 8.2.13
TensorRT 8.5.2.2

import torch
from memory_profiler import memory_usage
from ultralytics import YOLO

device = "cuda" if torch.cuda.is_available() else "cpu"
source = '/home/frame446.jpg'

# Function to measure memory and run inference
def measure_memory_and_time():
    # Add necessary imports, variables, or setups here if any
    start_time = time.time()
    model = YOLO("/home/int8/yolov8s_int8.engine", task="detect")
    results = model.predict(source, device=device)
    end_time = time.time()
    inference_time = end_time - start_time
    print(f"Inference Time: {inference_time} seconds")
    return results, inference_time

# Measure initial memory usage (assumes memory_usage function is imported)
initial_memory = memory_usage(-1, interval=0.05, timeout=1)

# Run the function and measure memory during execution
mem_usage = memory_usage(
    (measure_memory_and_time, (), {}), 
    interval=0.05, 
    include_children=True,
    max_usage=False,
    retval=False
)

# Measure final peak memory usage
final_peak_memory = max(mem_usage)

# Calculate average memory usage
average_memory = sum(mem_usage) / len(mem_usage)

print(f"Peak Memory Usage: {final_peak_memory} MiB")
print(f"Average Memory Usage: {average_memory} MiB")

These are my test results

FP32

image 1/1 /home/frame446.jpg: 384x640 4 cars, 290.5ms
Speed: 8.8ms preprocess, 290.5ms inference, 420.8ms postprocess per image at shape (1, 3, 384, 640)
Inference Time: 6.397905588150024 seconds
Peak Memory Usage: 3443.51953125 MiB
Average Memory Usage: 2141.6328519570707 MiB

FP16

image 1/1 /home/frame446.jpg: 640x640 4 cars, 23.5ms
Speed: 8.9ms preprocess, 23.5ms inference, 444.8ms postprocess per image at shape (1, 3, 640, 640)
Inference Time: 5.507491827011108 seconds
Peak Memory Usage: 3884.09375 MiB
Average Memory Usage: 2270.1096929505816 MiB

INT8

image 1/1 /home/frame446.jpg: 640x640 2 cars, 12.1ms
Speed: 17.8ms preprocess, 12.1ms inference, 578.6ms postprocess per image at shape (1, 3, 640, 640)
Inference Time: 7.327944040298462 seconds
Peak Memory Usage: 3619.57421875 MiB
Average Memory Usage: 2443.6972307477677 MiB

My new questions are

Why does INT8 consume the highest average RAM usage compared to FP32 and FP16? Doesn't it reduce RAM usage during inference?

from ultralytics.

glenn-jocher commented on June 16, 2024

Hi! Great to hear that you've successfully converted your YOLOv8s model to INT8 without any issues and are diving into performance testing! 🚀

Regarding your question on the higher RAM usage with INT8 compared to FP32 and FP16, this can sometimes happen due to a few reasons:

Model Loading and CUDA Overhead: INT8 models can incur additional overhead when loaded and initialized, particularly in CUDA environments. This includes memory for quantization tables and necessary alignment in memory.
TensorRT Execution Context: When running under TensorRT, the execution context for INT8 models might allocate extra workspace memory as part of its optimization strategies, which might not be as evident in FP16 or FP32 configurations.

Even though INT8 reduces the size of the weights and can accelerate the inference time, the total memory usage also depends on these factors along with how memory management is handled during the execution of these quantized models.

Feel free to monitor and tune the TensorRT settings, particularly around workspace allocations, to potentially reduce memory usage!

from ultralytics.

github-actions commented on June 16, 2024

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

from ultralytics.

Export YOLOv8 by using TensorRT with INT8 calibration about ultralytics HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs