Comments (7)
Hi there! Here's how you can handle exporting your YOLOv8 model to TensorRT with INT8 calibration:
-
Calibration Dataset Size: Generally, for INT8 calibration, using at least 1000 images from your dataset is advised to minimize any significant drop in accuracy. More would be better to represent the dataset's variability effectively.
-
Custom Calibration Dataset: You can specify your custom dataset during the export process by modifying the
data
argument in your code. Make sure this points to your dataset configuration file, similar to howcoco.yaml
is pointed out here:
from ultralytics import YOLO
model = YOLO("path/to/your/custom_yolov8.pt") # ensure this points to your trained model
model.export(
format="engine",
dynamic=True,
batch=8,
workspace=4,
int8=True,
data="path/to/your/custom_dataset.yaml", # specify your dataset configuration here
)
model = YOLO("yolov8n.engine", task="detect") # load the TensorRT model
Just replace "path/to/your/custom_dataset.yaml"
with the path to your dataset configuration file. This customization will enable the use of your intended dataset during the calibration process.
Happy exporting! 😊
from ultralytics.
@glenn-jocher Thanks for your response. I have two more questions about INT8 calibration:
-
Can you share the file coco.yaml, or suggest where I can download it? I need to see the details inside this file to build my custom_dataset.yaml. Currently, I have many images (all in .jpg format) for the calibration process in my directory, but I'm unsure how to use them to construct custom_dataset.yaml.
-
Can I use the latest Ultralytics version for exporting to INT8 and then use it (engine format) in an older version? Is it supported?
from ultralytics.
Hi! Great to see your interest in diving deeper into INT8 calibration with YOLOv8! 😊
- coco.yaml file: You can find the
coco.yaml
file in the Ultralytics GitHub repository under thedata
directory. Here is how a typicalcoco.yaml
looks like, customize it according to your dataset structure:
train: ../coco/train2017 # 118k images
val: ../coco/val2017 # 5k images
nc: 80
names: [ 'person', 'bicycle', ...] # Class names
For your custom_dataset.yaml
, ensure you specify the paths to your training and validation data accordingly, set nc
(number of classes), and define your class names
.
- Version Compatibility: Generally, it's advisable to use the same version of Ultralytics for both exporting and deploying your models. While backward compatibility might exist in some scenarios, using different versions can sometimes lead to unpredicted issues due to differences in implementations and dependencies.
I hope this helps! Let us know if there's anything else you need. 🌟
from ultralytics.
- Can I use only the validation set for calibration, or do I need to use both the training set and a calibration set?
from ultralytics.
Hi! For INT8 calibration, using just the validation set can be sufficient, especially if it's representative of the kind of data the model will encounter in deployment. However, including diverse images from the training set can further enhance the robustness of your calibrated model. Just ensure that the chosen images are varied enough to cover different scenarios. Happy calibrating! 😊
from ultralytics.
@glenn-jocher
I have successfully converted YOLOv8s from FP32 to INT8 without any errors, thanks for your support. Next, I use the following Python code to test the performance of INT8 in terms of inference time and RAM usage, comparing it with YOLOv8s in FP32 (.pt) and FP16 (.engine) formats, using a single image.
My environment
- NVIDIA Jetson Orin Nano (8GB)
- Python 3.8.10
- Ultralytics 8.2.13
- TensorRT 8.5.2.2
import torch
from memory_profiler import memory_usage
from ultralytics import YOLO
device = "cuda" if torch.cuda.is_available() else "cpu"
source = '/home/frame446.jpg'
# Function to measure memory and run inference
def measure_memory_and_time():
# Add necessary imports, variables, or setups here if any
start_time = time.time()
model = YOLO("/home/int8/yolov8s_int8.engine", task="detect")
results = model.predict(source, device=device)
end_time = time.time()
inference_time = end_time - start_time
print(f"Inference Time: {inference_time} seconds")
return results, inference_time
# Measure initial memory usage (assumes memory_usage function is imported)
initial_memory = memory_usage(-1, interval=0.05, timeout=1)
# Run the function and measure memory during execution
mem_usage = memory_usage(
(measure_memory_and_time, (), {}),
interval=0.05,
include_children=True,
max_usage=False,
retval=False
)
# Measure final peak memory usage
final_peak_memory = max(mem_usage)
# Calculate average memory usage
average_memory = sum(mem_usage) / len(mem_usage)
print(f"Peak Memory Usage: {final_peak_memory} MiB")
print(f"Average Memory Usage: {average_memory} MiB")
These are my test results
FP32
image 1/1 /home/frame446.jpg: 384x640 4 cars, 290.5ms
Speed: 8.8ms preprocess, 290.5ms inference, 420.8ms postprocess per image at shape (1, 3, 384, 640)
Inference Time: 6.397905588150024 seconds
Peak Memory Usage: 3443.51953125 MiB
Average Memory Usage: 2141.6328519570707 MiB
FP16
image 1/1 /home/frame446.jpg: 640x640 4 cars, 23.5ms
Speed: 8.9ms preprocess, 23.5ms inference, 444.8ms postprocess per image at shape (1, 3, 640, 640)
Inference Time: 5.507491827011108 seconds
Peak Memory Usage: 3884.09375 MiB
Average Memory Usage: 2270.1096929505816 MiB
INT8
image 1/1 /home/frame446.jpg: 640x640 2 cars, 12.1ms
Speed: 17.8ms preprocess, 12.1ms inference, 578.6ms postprocess per image at shape (1, 3, 640, 640)
Inference Time: 7.327944040298462 seconds
Peak Memory Usage: 3619.57421875 MiB
Average Memory Usage: 2443.6972307477677 MiB
My new questions are
Why does INT8 consume the highest average RAM usage compared to FP32 and FP16? Doesn't it reduce RAM usage during inference?
from ultralytics.
Hi! Great to hear that you've successfully converted your YOLOv8s model to INT8 without any issues and are diving into performance testing! 🚀
Regarding your question on the higher RAM usage with INT8 compared to FP32 and FP16, this can sometimes happen due to a few reasons:
- Model Loading and CUDA Overhead: INT8 models can incur additional overhead when loaded and initialized, particularly in CUDA environments. This includes memory for quantization tables and necessary alignment in memory.
- TensorRT Execution Context: When running under TensorRT, the execution context for INT8 models might allocate extra workspace memory as part of its optimization strategies, which might not be as evident in FP16 or FP32 configurations.
Even though INT8 reduces the size of the weights and can accelerate the inference time, the total memory usage also depends on these factors along with how memory management is handled during the execution of these quantized models.
Feel free to monitor and tune the TensorRT settings, particularly around workspace allocations, to potentially reduce memory usage!
from ultralytics.
Related Issues (20)
- Second prediction result score HOT 4
- yolov8 tags=8.2 The model could not be loaded HOT 2
- yolov8 Tags=8.2 model = YOLO(model='yolov8n.yaml') error occurred HOT 1
- Tflite Export Failure with INT8 Quantization and Dimension Mismatch HOT 3
- Running yolo with GPU HOT 4
- Discrepancy between val params (conf, iou) and corresponding values in results.json(confidence_threshold, iou_threshold) HOT 4
- model.predictor returns None HOT 4
- Yolov9-e and yolov9 -c configuration difference between official repo and ultralytics HOT 1
- Color difference, how to detect it HOT 1
- For running yolov8 on onnxruntime, would you consider enabling DML provider? HOT 1
- YOLOv8 OBB detection angle issue HOT 2
- How to Train model with different dataset contain different classes HOT 23
- Make Yolo8 colab output not a mess HOT 2
- The time to execute detect using the gpu is slower than that using the cpu HOT 3
- ModuleNotFoundError: No module named 'ultralytics' when importing to one script, while working on another HOT 2
- data augmentation on yolov8-pose HOT 1
- System RAM not getting properly released HOT 3
- Convert Yolov8 to CoreML give different results HOT 6
- Parameters in Yolov8 HOT 2
- Issues with stride multiple of 32 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ultralytics.