cyrusbehr / yolov8-tensorrt-cpp Goto Github PK

YOLOv8 TensorRT C++ Implementation

License: MIT License

CMake 8.19% Python 0.83% C++ 90.98%

computer-vision cpp machine-learning tensorrt yolo yolov8

yolov8-tensorrt-cpp's Introduction

👋 Nice to meet you！

I'm Cyrus, a computer vision software developer based out of the United States specializing in high-performance machine learning inference. I'm best known for my work with TensorRT C++ API and have created several open-source tutorial projects. For consulting work, please connect with me on LinkedIn.

yolov8-tensorrt-cpp's People

Contributors

Stargazers

Watchers

yolov8-tensorrt-cpp's Issues

[OpenCV with CUDA build error] Ambiguous overload for 'operator!='

In file included from /home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/normalize_bbox_layer.cpp:50:
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp: In instantiation of ‘void cv::dnn::cuda4dnn::NormalizeOp<T>::forward(const std::vector<cv::Ptr<cv::dnn::dnn4_v20230620::BackendWrapper> >&, const std::vector<cv::Ptr<cv::dnn::dnn4_v20230620::BackendWrapper> >&, cv::dnn::cuda4dnn::csl::Workspace&) [with T = __half]’:
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:89:14:   required from here
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: error: ambiguous overload for ‘operator!=’ (operand types are ‘__half’ and ‘double’)
  114 |             if (weight != 1.0)
      |                 ~~~~~~~^~~~~~
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(long long unsigned int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(long long int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(long unsigned int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(long int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(unsigned int, double)’ (built-in)
/home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/primitives/normalize_bbox.hpp:114:24: note: candidate: ‘operator!=(float, double)’ (built-in)
In file included from /usr/local/cuda/include/cuda_fp16.h:4070,
                 from /usr/local/cuda/include/cublas_api.h:77,
                 from /usr/local/cuda/include/cublas_v2.h:69,
                 from /home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../cuda4dnn/csl/cublas.hpp:14,
                 from /home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/../op_cuda.hpp:11,
                 from /home/di0n/Desktop/opencv-4.8.0/modules/dnn/src/layers/normalize_bbox_layer.cpp:45:
/usr/local/cuda/include/cuda_fp16.hpp:714:42: note: candidate: ‘bool operator!=(const __half&, const __half&)’
  714 | __CUDA_HOSTDEVICE__ __forceinline__ bool operator!=(const __half &lh, const __half &rh) { return __hneu(lh, rh); }
      |                                          ^~~~~~~~
gmake[2]: *** [modules/dnn/CMakeFiles/opencv_dnn.dir/build.make:1148: modules/dnn/CMakeFiles/opencv_dnn.dir/src/layers/normalize_bbox_layer.cpp.o] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:5170: modules/dnn/CMakeFiles/opencv_dnn.dir/all] Error 2
gmake: *** [Makefile:166: all] Error 2

Hello, I've been trying to build OpenCV with CUDA for the last couple of hours, I have installed and set up both CUDA and cudNN before attempting to run a modified version of the bash script provided in tensorrt-cpp-api, I've modified some parameters such as the cudNN paths and the arch-bin to suit my own GPU(1050ti):

VERSION=4.8.0

test -e ${VERSION}.zip || wget https://github.com/opencv/opencv/archive/refs/tags/${VERSION}.zip
test -e opencv-${VERSION} || unzip ${VERSION}.zip

test -e opencv_extra_${VERSION}.zip || wget -O opencv_extra_${VERSION}.zip https://github.com/opencv/opencv_contrib/archive/refs/tags/${VERSION}.zip
test -e opencv_contrib-${VERSION} || unzip opencv_extra_${VERSION}.zip


cd opencv-${VERSION}
mkdir build
cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX=/usr/local \
-D WITH_TBB=ON \
-D ENABLE_FAST_MATH=1 \
-D CUDA_FAST_MATH=1 \
-D WITH_CUBLAS=1 \
-D WITH_CUDA=ON \
-D BUILD_opencv_cudacodec=ON \
-D WITH_CUDNN=ON \
-D OPENCV_DNN_CUDA=ON \
-D WITH_QT=OFF \
-D WITH_OPENGL=ON \
-D BUILD_opencv_apps=OFF \
-D BUILD_opencv_python2=OFF \
-D OPENCV_GENERATE_PKGCONFIG=ON \
-D OPENCV_PC_FILE_NAME=opencv.pc \
-D OPENCV_ENABLE_NONFREE=ON \
-D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib-${VERSION}/modules \
-D INSTALL_PYTHON_EXAMPLES=OFF \
-D INSTALL_C_EXAMPLES=OFF \
-D BUILD_EXAMPLES=OFF \
-D CUDA_ARCH_BIN=6.1 \
-D WITH_FFMPEG=ON \
-D CUDNN_INCLUDE_DIR=/usr/include \
-D CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so \
..

make -j 8
sudo make -j 8 install

Sadly every time I try to build OpenCV with CUDA enabled I run into some sort of problem, most likely being caused by the ambiguous overload I provided above, is there an official patch for this? Or should I just try to static cast the "__half” type to some other type which can be compared with a “double”? Thanks for this amazing implementation by the way, whenever I can get it to work and have some free time off school I'll integrate a python project of mine to C/C++ with it.

Regards,

Cannot detect object on Jetson AGX Orin

Hi, thank you for the amazing repository
I want to ask, i have followed your guide, but the result can't be detect object (no bounding boxes in any objects)
Here i use Jetson AGX Orin
Any help will be appreciated

YoloV8 segmentation suddenly terminated

When i run the yolov8 segmentation models in jetson orin with video file after several seconds, it suddenly terminated with error message shown below

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.8.0) /home/roisc-orin/opencv-4.8.0/modules/core/src/matrix_expressions.cpp:24: error: (-5:Bad argument) Matrix operand is an empty matrix. in function 'checkOperandsExist'

Aborted (core dumped)

but, when i run with yolov8 detection models it doesn't have problem at all. Any advice ?

Custom trained model support

Hi thank you for sharing your great work!

i am just curious that can i run executables with my custom trained model?
my model has 12 class, not like coco dataset.

i export my best(yolov8n).pt model to onnx and then i run the detect_object_video
then output is x,y, width, height are all 0 but only there is class name.

except for this everything works well!

Aborted (core dumped) terminate called after throwing an instance of 'std::runtime_error'

nvidia@ubuntu:~/Desktop/HXB/11-3/YOLOv8-TensorRT-CPP/build$ ./detect_object_image --model /home/nvidia/Desktop/HXB/11-3/yolov8n_32.onnx --input /home/nvidia/Desktop/HXB/11-3/bus2.jpg
Searching for engine file with name: yolov8n_32.engine.NVIDIATegraX2.fp16.1.1
Engine not found, generating. This could take a while...
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Support for setting class names via public method

Currently, the library lacks support for setting custom class names externally without modifying the source code and recompiling it. I suggest adding a method that allows users to set custom class names programmatically (or through arguments in the cli). I am willing to contribute to the project by creating a pr.

How to set "CUDA lazy loading" ??

I try example! "object_detection_image" and get this message

CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading

How to set "CUDA lazy loading" in example??

ERROR: Package 'ultralytics' requires a different Python: 3.6.15 not in '>=3.8'

I install the feat/jetson-tx2 branch, and met a conflict

Does not work for newer NVIDIA graphics cards

Since the OpenCV 4.8.0 with CUDA support is only available for CUDA < 12 it doesn't support newer graphics cards of 40 series and such. 40 series graphics cards can only be downgraded till 525.xx.xx driver version minimum and < 12 CUDA is available only below 520.xx.xx NVIDIA drivers.
More information in this slack answer

Is there any other way to get this code to work on newer graphics cards?

Getting inference data from these results

Is there any way to get the inference data from these results. Like coordinates of the bbox. Or keypoints in pose. In live video

cvShowImage function not implemented

i get this error when running the video file. can u help me with this. already tried installing libgtk2.0-dev and pkg-config

$ ./detect_object_video --model /home/inp/YOLOv8-TensorRT-CPP/models/yolov8n-pose.onnx --input 0
Searching for engine file with name: yolov8n-pose.engine.NVIDIAGraphicsDevice.fp16.1.1
Engine found, not regenerating...
Original video resolution: (640x480)
New video resolution: (1280x720)
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.9.0-dev) /home/inp/opencv/opencv/modules/highgui/src/window.cpp:1272: error: (-2:Unspecified error) The function is not implemented. Rebuild the library with Windows, GTK+ 2.x or Cocoa support. If you are on Ubuntu or Debian, install libgtk2.0-dev and pkg-config, then re-run cmake or configure script in function 'cvShowImage'

Aborted (core dumped)

Build on Windows

Is it possible to build it on Windows. I am trying to redact cmakelists.txt, findtensorrt.cmake. But I didn't get the make file. And one more question, can i use it in my projects?###
My tech specs:

Windows 10 x64
Cuda 11.8
CudaNN 8.9.1
Tensorrt 8.5.3.1

Jetson-TX2 contribution

Hi, and thanks for your work.

I updated your code to make it work on a jetson-TX2 which is compatible with:

OpenCV GSTREAMER
C++14

https://github.com/ltetrel/YOLOv8-TensorRT-CPP/tree/feat/jetson-tx2

Let me know how you want to proceed, from there I see 3 options:

Changes in another branch feat/jetson-tx2 (as I did on my fork). I recommend this and I would need you to create this branch on your repo so I can make a PR.
Changes living in main branch. I don't recommend this to avoid polluting your original code but that could be possible to have everything at the same place with a bunch of #ifdef depending on cuda version , c++ etc...
No mention to this work on your repo, even if I think it could benefit the community.

CUDA Operation not supported

Not sure what's going wrong with the GPU. I'm running CUDA 11.8

My nvidia-smi:

NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 750 Ti      On  | 00000000:01:00.0  On |                  N/A |
|  0%   30C    P8               1W /  38W |    189MiB /  2048MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A       981      G   /usr/lib/xorg/Xorg                           43MiB |
|    0   N/A  N/A      1239      G   /usr/bin/gnome-shell                         69MiB |
|    0   N/A  N/A      4197      G   /usr/lib/xorg/Xorg                           70MiB |
+---------------------------------------------------------------------------------------+

Error Message:

Success, saved engine to yolov8n.engine.NVIDIAGeForceGTX750Ti.fp16.1.1
Loaded engine size: 18 MiB
Trying to load shared library libcudnn.so.8
Loaded shared library libcudnn.so.8
Using cuDNN as plugin tactic source
Using cuDNN as core library tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +9, now: CPU 226, GPU 266 (MiB)
Deserialization required 15763 microseconds.
[MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +19, now: CPU 0, GPU 19 (MiB)
Trying to load shared library libcudnn.so.8
Loaded shared library libcudnn.so.8
Using cuDNN as plugin tactic source
Using cuDNN as core library tactic source
[MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 226, GPU 267 (MiB)
Total per-runner device persistent memory is 1472512
Total per-runner host persistent memory is 126032
Allocated activation device memory of size 18329600
[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +19, now: CPU 0, GPU 38 (MiB)
CUDA lazy loading is enabled.
CUDA operation failed with code: 801(cudaErrorNotSupported), with message: operation not supported
terminate called after throwing an instance of 'std::runtime_error'
  what():  CUDA operation failed with code: 801(cudaErrorNotSupported), with message: operation not supported
Aborted (core dumped)

is it detectObject thread safe?

I want to know whether detectObject is thread safe, that means i can call this function in multi threads

Training on a Dataset

I want to train my dataset using this repo. Which steps should I follow? I mean, the number of epochs...

#include "engine.h" where is this head file？

Cmake can’t find TensorRT

Hello,
Thank you for sharing this repository. After installing the requirements, I cloned it and built it . However, I received this error:
Could NOT find TensorRT (missing: TensorRT_LIBRARY) (found version "..")
Call Stack (most recent call first):
/home/safebot/.local/lib/python3.8/site-packages/cmake/data/share/cmake-3.27/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
cmake/FindTensorRT.cmake:73 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
libs/tensorrt-cpp-api/CMakeLists.txt:22 (find_package)

Which is strange considering I used Debian to install TensorRT and the library gets installed by default on
set(TensorRT_DIR /usr/lib/x86_64-linux-gnu/)
And I checked that as well. I also used cmake gui to build it and I got this error after specifying /usr/lib/x86_64-linux-gnu as library directory :TensorRT_NVONNXPARSER_LIBRARY-NOTFOUND

I would appreciate it if you could help me identify the source of this issue when building the project.
Kind regards
Nima

Error from number of outputs

Within your YoloV8::detectObjects function, your code checks to make sure the number of outputs == 1 for a detection model; however, when I take my custom yolov8-trained model for detection, it has 3 outputs...thus it thinks its a segmentation model and errors out.

Here's the command I use to train my custom model:

yolo task=detect mode=train model=yolov8m.pt data=custom.yaml plots=True epochs=200 imgsz=640,480

Once trained, I use your pytorch2onnx.py script to convert it to an onnx file.

This all seems odd to me since if I take one of the yolov8 provided models, e.g. yolov8m.pt, and convert it using your script, it only shows 1 output and runs fine.

Not sure what's going on. Any ideas??

Thanks!

Modification for YOLOv7

Is it possible to remake the yolo8 .h .cpp files for support YOLOv7?

“engine.h”: No such file or directory

“engine.h”: No such file or directory。What does this file need to be installed

Cannot build yolov8n-seg.pt when using INT8

Thanks for your creative work.
I am currently using your project on Windows 10. For detection, it has worked successfully when using FP16 or INT8. For segmentation, it has worked successfully when using FP16.
I tried to build yolov8n-seg.pt in INT8 precision, but some errors occurred.

If you have any problems, I can provide you with more information.

Cannot build engine

Hi, and thank you for making this code available!

I am building and I see teh following error:

Searching for engine file with name: yolov8x.engine.NVIDIAGeForceRTX3090.fp16.1.1.2000000000
Engine not found, generating. This could take a while...
4: [network.cpp::nvinfer1::Network::validate::2671] Error Code 4: Internal Error (Network must have at least one output)
2: [builder.cpp::nvinfer1::builder::Builder::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. )

What might be causing this?

I have a question about this example

I create example
-> if i push button run detect for 11times for same image.

Detection time left_1_Img = "0.5520" sec
Detection time left_1_Img = "0.0840" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec

After a 1minutes.... I push button
Detection time left_1_Img = "0.2110" sec
Detection time left_1_Img = "0.1330" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0180" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec

After a 1minutes.... I push button
Detection time left_1_Img = "0.2120" sec
Detection time left_1_Img = "0.0210" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec
Detection time left_1_Img = "0.0190" sec
Detection time left_1_Img = "0.0200" sec

Is this tensorRT feature???

can i use option??

opencv2/cudaimgproc.hpp: No such file or directory

(CUDA113+CUDNN82) han@han:~/Desktop/hxb_projects/CPP_Instance/10-31/git-2/YOLOv8-TensorRT-CPP/build$ make [ 8%] Building CXX object libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o /home/han/Desktop/hxb_projects/CPP_Instance/10-31/git-2/YOLOv8-TensorRT-CPP/libs/tensorrt-cpp-api/src/engine.cpp:7:10: fatal error: opencv2/cudaimgproc.hpp: No such file or directory 7 | #include <opencv2/cudaimgproc.hpp> | ^~~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated. make[2]: *** [libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/build.make:76: libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/src/engine.cpp.o] Error 1 make[1]: *** [CMakeFiles/Makefile2:215: libs/tensorrt-cpp-api/CMakeFiles/tensorrt_cpp_api.dir/all] Error 2 make: *** [Makefile:91: all] Error 2

Error with Build

Hello,

i have a Problem to Build files with Visual Studio 2022
you can see the Error in Snapshot.

Can you explain me how i resolve this to build it without Errors

Thanks

How to do tracking mode?

Good day @cyrusbehr ,
This is not an issue but a question of mine. I have tried your project, and it actually has saved me a lot of time.
I am able to get the segmentation info, but now I need to get YoloV8 to work with tracking mode. Could you please tell me how to achieve that?

Thank you so much for your project, and also your time to answer this question

How to export custom trained engine file

Hello

I have a custom trained pt file of yolov8 and when I create an engine file using your codebase, it defaults to the coco dataset classes. How do I export and run an engine file which is trained on my data?

Thank you for your help

Train Using Unlabeled Dataset

Hello,
I'm trying to use pretrained yolov8n model to train my dataset (unlabeled). I code in c++, use vs code. How can I do that?
Is it mandatory to label the dataset first? I have amount of images.
Thank you so much.

Trained the model with imgz parammeter 416

Hi ,
First of all thanks for your great repo
I have worked this repo for my custom segmentation task ,Now for improving my FPS i had reduced the imgz from 640 to 416 .when i have given that .onnx model it shows error in the post processing function.
Can you please help me out.

terminate called after throwing an instance of 'std::logic_error'
what(): Output at index 0 has incorrect length
Aborted (core dumped)

Input video i am giving is 1920x1080
Board :Jetson Xavier

@cyrusbehr @z3lx

Untrusted benchmark test??

2000 iterations fast, but 100000 times, very, very slow. why??

2000 iterations Avg time: 4.8025 ms，Avg FPS:208 fps.
100000 iterations More than 5 minutes outputting results, still working...

start...
Avg Preprocess time: 0.42 ms
Avg Inference time: 1.447 ms
Avg Postprocess time: 2.915 ms

Avg Preprocess time: 0.42 ms
Avg Inference time: 1.447 ms
Avg Postprocess time: 2.915 ms

Avg Preprocess time: 0.419 ms
Avg Inference time: 1.447 ms
Avg Postprocess time: 2.914 ms

... After 1 minute...

Avg Preprocess time: 1.18 ms
Avg Inference time: 4.677 ms
Avg Postprocess time: 3.895 ms

... After 3 minutes ...

Avg Preprocess time: 1.245 ms
Avg Inference time: 4.911 ms
Avg Postprocess time: 3.894 ms

Why is it getting slower and slower??

Add support for YOLOv8-obb models

Hello!

Would it be possible to add support for yolov8-obb models?

Im happy to work on this feature as well

OpenCV: Errors During Build

Hi Cyrus, I am getting errors when I try to build openCV from tensorrt-cpp-api/scripts/build_opencv.sh

The cmake configuration compiles with few errors and ultimately I get the following error:

gt@Jetson-Orin:~/YOLOv8-TensorRT-CPP/opencv-4.8.0/build$ make -j 8
make: *** No targets specified and no makefile found. Stop.

gt@Jetson-Orin:~/YOLOv8-TensorRT-CPP/opencv-4.8.0/build$ sudo make -j 8 install
[sudo] password for gt:
make: *** No rule to make target 'install'. Stop.

My environment is as follows:

Jetson Orin Hardware
CUDA 11.4.315
cuDNN 8.6.0.166

However, I am not able to find the right directory for CUDNN_INCLUDE_DIR and CUDNN_LIBRARY to build openCV. Can you please help?

Thanks in advance,
Adhok

No objects detected, with no errors

Thank you in advance for your time and work. I am currently trying to use the YOLOv8 output into a ROS1 environment, as ROS1 needs C++.

Running Ubuntu 20 on a Jetson Orin Nano.

Following your instructions:
Everything installed successfully, and I am currently not receiving any errors.

I converted the yolov8n.pt to onnx.
When I run inference on a test image, or open my webcam I do not see any annotated results and no errors produced. Not sure where to go from here... lol? I saw the other issue/question. I do not have another Cuda enabled PC to test on. I do run other normal YOLOv8 packages not using your repository, successfully.

Output at index 0 has incorrect length Aborted (core dumped)

Hi, I'm trying to run this repo using FastSAM, I converted it using your script:

python3 pytorch2onnx.py --pt_path ../../models/FastSAM-x.pt

and tried to run inference on this image:

But when I tried to do the detection, I got an error:

$ CUDA_MODULE_LOADING=LAZY ./detect_object_image --model ../../models/FastSAM-x.onnx --input ../../imgs_tst/boxes.jpg 
Searching for engine file with name: FastSAM-x.engine.NVIDIAGeForceRTX2070Super.fp16.1.1
Engine found, not regenerating...
terminate called after throwing an instance of 'std::logic_error'
  what():  Output at index 0 has incorrect length
Aborted (core dumped)

Can you please tell me how can I fix that please? thanks.

What is the performance gain?

Hello, @cyrusbehr, I can see that you provide tensorRT model for C++ inference. Currently, I am using tensorRT's .engine model exported for python API and it's working around 60% faster (in object tracking) compared to pytorch's .pt model. I wonder if I can make it more faster (with little or no accuracy drop) with your solution. Do you have any estimates on this, like how much faster it gets and what % accuracy is lost (specific to your GPU model)? Thank you a lot!

A word of caution on this implimentation and broader yolo constructs.

I'm here as are many others I'm sure because you used ultralytics and now you want tensorrt speed with c++ without dealing with onnx frameworks etc.

Just a word of caution to passerbys, this implementation is kinda half baked. For better or worse ultralytics obstructs a lot of concepts and handles so much for you sometimes you might not (I sure didn't) understand what really goes into these models and their nuanced differences. My understanding was, and to some extent still is if I do x, I get y but I don't actually understand all the underlying concepts.

For example he says use his export script. Well no one tells you, and there are no errors if you dont use the same imgsz and then you're wondering why things are just... off.

https://docs.ultralytics.com/modes/export/

Overall, I'm starting to wonder if ultralytics is generally not so hot. Its an awesome toolchain for beginners (me), but as you try to implement in other languages you start to realize you don't actually know what you're doing.

https://docs.opencv.org/4.x/da/d9d/tutorial_dnn_yolo.html

This actually helped me just kind of bring into scope the concept of yolo models. Mind you this does not work with tensorRT, just a spot to explore other toolchains if you're here and finding walls.

edit: Yeah, even after trying a ton of stuff, this just feels WILDLY under accurate and I'm just not well versed enough quite yet to know where I'm losing it.

Unable to generate seg engine trained with Ultralytics version 8.0.183

Hello and thank you for your good job in bringing Yolov8 to the TensorRT C++ side.

I would like to help if it is possible but for now I'm facing an issue with the engine creation in case of a segmentation model. It seems that there is a missing stuff for "ConvTranspose_178 (CaskDeconvolution)" if I don't missunderstand logs.

I run the code on a TX2 board (with branch feat/jetson-tx2 obviously)
Here is the jetson environment:
$ jetson_release
Software part of jetson-stats 4.2.3 - (c) 2023, Raffaello Bonghi
Model: quill - Jetpack 4.6.4 [L4T 32.7.4]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

P-Number: p3310-1000
Module: NVIDIA Jetson TX2
Platform:
Distribution: Ubuntu 18.04 Bionic Beaver
Release: 4.9.337-tegra
jtop:
Version: 4.2.3
Service: Active
Libraries:
CUDA: 10.2.300
cuDNN: 8.2.1.32
TensorRT: 8.2
VPI: 1.2.3
Vulkan: 1.2.70
OpenCV: 4.8.0 - with CUDA: YES

Here is the command I use:
./benchmark --model yolov8n_seg.onnx --input ~/workspace/ppanto_yolo/test_ressources --precision FP16 --class-names class1 class2

Here are the relevant pat of the logs.

--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 8192000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Float(409600,1,5120,64) -> Float(1638400,1,10240,64) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
GemmDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(409600,6400,80,1) -> Half(1638400,25600,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
*************** Autotuning format combination: Half(204800,6400:2,80,1) -> Half(819200,25600:2,160,1) ***************
--------------- Timing Runner: ConvTranspose_178 (CudnnDeconvolution)
CudnnDeconvolution has no valid tactics for this config, skipping
--------------- Timing Runner: ConvTranspose_178 (GemmDeconvolution)
Tactic: 0 skipped. Scratch requested: 4096000, available: 0
Fastest Tactic: -3360065831133338131 Time: inf
--------------- Timing Runner: ConvTranspose_178 (CaskDeconvolution)
CaskDeconvolution has no valid tactics for this config, skipping
Deleting timing cache: 1496 entries, 2612 hits
10: [optimizer.cpp::computeCosts::2011] Error Code 10: Internal Error (Could not find any implementation for node ConvTranspose_178.)
2: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. )
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)

Do you have an idea of what I can do to get the model working right? What I don't understand is that I can export to engine using Ultralytics export and trtexec. Do you have a clue?

Best regards

could provide the test video?

dynamic batch

Hello,if I want to change bacth, do I need to add batch=6 to Model.export (format="onnx", simplify=True) when exporting the model?But there is a bug that 'Implementation currently only supports dynamic batch sizes or a fixed batch size of 1.How can I fix it ?

Detected 0 objects

Hi,

Thanks for this great repo.

So, I'm trying to use it with a YOLOV8x seg model. My model uses only one class and its input is 448x448.
I changed the options parameters as:

    int segChannels = 32;
    int segH = 112;
    int segW = 112;

   std::vector<std::string> classNames = {
        "tool"
    };

I confirm that segH and segW using Netron.

name: output0
tensor: float32[1,37,4116]

name: output1
tensor: float32[1,32,112,112]

I tested the onnx weights using Ultralytics, and I got the expected results (12 objects ). The command used was:

yolo predict task=segment model=~/Documents/tools/bkp/yolov8seg.onnx

I used the pytorch2onnx.py script to convert from pt to onnx.

from ultralytics import YOLO
import argparse

parser = argparse.ArgumentParser(description='Process pt file.')
parser.add_argument('--pt_path', help='path to pt file', required=True)
args = parser.parse_args()
# TODO: Specify which model you want to convert
# Model can be downloaded from https://github.com/ultralytics/ultralytics
model = YOLO(args.pt_path)
model.fuse()
model.info(verbose=False)  # Print model information
model.export(format="onnx", simplify=True, opset=12)

P.S: the simplify=True gave a different output... maybe you all can consider using it as False.

So... When I lunch the command: Searching for engine file with name: yolov8seg.engine.NVIDIAGeForceRTX2070.fp16.1.1

Engine found, not regenerating...
CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
**Detected 0 objects**
Saved annotated image to: /home/adriano/Documents/dataset_base/o_annotated.jpg

Versions used:

TensorRT-8.6.0.12
Cuda compilation tools, release 12.2, V12.2.140
Ultralytics 8.0.208

I tested with FP32 and FP16 and I got the same result.

Does someone have some idea?

Grey scale image for infrence

Hai,
I have trained grey scale images for a segmentation model .Does your repo supports the grey scale images thanks in advance.....
@cyrusbehr @z3lx

Can I use non-square ratio (seg_w/seg_h) for segmentation?

I have a model with non-square input size (832x512, both values are divisible by 32) for segmentation. Then the output size will be 208х128. Therefore, I should change some values in yolov8.h, as shown below:

// Segmentation config options
    int segChannels = 32;
    int segH = 128;
    int segW = 208;
    float segmentationThreshold = 0.5f;

But in this case, the error appears:

  terminate called after throwing an instance of 'cv::Exception'
 what():  OpenCV(4.8.0) /tmp/opencv-4.8.0/modules/core/src/matrix.cpp:808: error: (-215:Assertion failed) 0 <= roi.x && 0 <= roi.width && roi.x + roi.width <= m.cols && 0 <= roi.y && 0 <= roi.height && roi.y + roi.height <= m.rows in function 'Mat'

When I use square input (for example, 512x512 for input, then segW x segH = 128x128 for output), then there is no error and segmentation works correctly. I noticed that the error occurs in the line dest = dest(roi); in yolov8.cpp file, because in this case dest.size(): [128 x 208] and roi.size(): [208 x 80]. Is there a way to run your segmentation component for such "non-square" width/height tensor sizes?

onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.

nvidia@ubuntu:~/Desktop/HXB/11-4/YOLOv8-TensorRT-CPP/build$ ./detect_object_image --model /home/nvidia/Desktop/HXB/11-4/yolov8n_1527.onnx --input ./bus2.jpg
Searching for engine file with name: yolov8n_1527.engine.NVIDIATegraX2.fp16.1.1
Engine not found, generating. This could take a while...
onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
terminate called after throwing an instance of 'std::runtime_error'
what(): Error: Unable to build the TensorRT engine. Try increasing TensorRT log severity to kVERBOSE (in /libs/tensorrt-cpp-api/engine.cpp).
Aborted (core dumped)
@ltetrel

about the classify task

I add the classify code to this project, and found the result of classification is wrong, I guess the reason comes from the converting of classify model. someone know the reason? Thank you!

cyrusbehr / yolov8-tensorrt-cpp Goto Github PK

yolov8-tensorrt-cpp's Introduction

👋 Nice to meet you！

yolov8-tensorrt-cpp's People

Contributors

Stargazers

Watchers

Forkers

yolov8-tensorrt-cpp's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs