zhiqwang / yolort Goto Github PK

View Code? Open in Web Editor NEW

706.0 13.0 151.0 23.48 MB

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.

Home Page: https://zhiqwang.com/yolort

License: GNU General Public License v3.0

CMake 1.24% C++ 16.97% Python 81.80%

libtorch yolov5 inference torchscript onnx onnxruntime tvm pytorch detection jit

yolort's Introduction

YOLOv5 Runtime Stack

Documentation • Installation Instructions • Deployment • Contributing • Reporting Issues

🤗 Introduction

What it is. Yet another implementation of Ultralytics's YOLOv5. yolort aims to make the training and inference of the object detection task integrate more seamlessly together. yolort now adopts the same model structure as the official YOLOv5. The significant difference is that we adopt the dynamic shape mechanism, and within this, we can embed both pre-processing (letterbox) and post-processing (nms) into the model graph, which simplifies the deployment strategy. In this sense, yolort makes it possible to deploy the object detection more easily and friendly on LibTorch, ONNX Runtime, TVM, TensorRT and so on.

About the code. Follow the design principle of detr:

object detection should not be more difficult than classification, and should not require complex libraries for training and inference.

yolort is very simple to implement and experiment with. Do you like the implementation of torchvision's faster-rcnn, retinanet or detr? Do you like yolov5? You'll love yolort!

🆕 What's New

Dec. 27, 2021. Add TensorRT C++ interface example. Thanks to Shiquan.
Dec. 25, 2021. Support exporting to TensorRT, and inferencing with TensorRT Python interface.
Sep. 24, 2021. Add ONNX Runtime C++ interface example. Thanks to Fidan.
Feb. 5, 2021. Add TVM compile and inference notebooks.
Nov. 21, 2020. Add graph visualization tools.
Nov. 17, 2020. Support exporting to ONNX, and inferencing with ONNX Runtime Python interface.
Nov. 16, 2020. Refactor YOLO modules and support dynamic shape/batch inference.
Nov. 4, 2020. Add LibTorch C++ inference example.
Oct. 8, 2020. Support exporting to TorchScript model.

🛠️ Usage

There are no extra compiled components in yolort and package dependencies are minimal, so the code is very simple to use.

Installation and Inference Examples

Above all, follow the official instructions to install PyTorch 1.8.0+ and torchvision 0.9.0+

Installation via pip

Simple installation from PyPI

pip install -U yolort

Or from Source

# clone yolort repository locally
git clone https://github.com/zhiqwang/yolort.git
cd yolort
# install in editable mode
pip install -e .

Install pycocotools (for evaluation on COCO):

pip install -U 'git+https://github.com/ppwwyyxx/cocoapi.git#subdirectory=PythonAPI'

To read a source of image(s) and detect its objects 🔥

from yolort.models import yolov5s

# Load model
model = yolov5s(pretrained=True, score_thresh=0.45)
model.eval()

# Perform inference on an image file
predictions = model.predict("bus.jpg")
# Perform inference on a list of image files
predictions = model.predict(["bus.jpg", "zidane.jpg"])

Loading via `torch.hub`

The models are also available via torch hub, to load yolov5s with pretrained weights simply do:

model = torch.hub.load("zhiqwang/yolort:main", "yolov5s", pretrained=True)

Loading checkpoint from official yolov5

The following is the interface for loading the checkpoint weights trained with ultralytics/yolov5. Please see our documents on what we share and how we differ from yolov5 for more details.

from yolort.models import YOLOv5

# Download checkpoint from https://github.com/ultralytics/yolov5/releases/download/v6.0/yolov5s.pt
ckpt_path_from_ultralytics = "yolov5s.pt"
model = YOLOv5.load_from_yolov5(ckpt_path_from_ultralytics, score_thresh=0.25)

model.eval()
img_path = "test/assets/bus.jpg"
predictions = model.predict(img_path)

🚀 Deployment

Inference on LibTorch backend

We provide a tutorial to demonstrate how the model is converted into torchscript. And we provide a C++ example of how to do inference with the serialized torchscript model.

Inference on ONNX Runtime backend

We provide a pipeline for deploying yolort with ONNX Runtime.

from yolort.runtime import PredictorORT

# Load the serialized ONNX model
engine_path = "yolov5n6.onnx"
y_runtime = PredictorORT(engine_path, device="cpu")

# Perform inference on an image file
predictions = y_runtime.predict("bus.jpg")

Please check out this tutorial to use yolort's ONNX model conversion and ONNX Runtime inferencing. And you can use the example for ONNX Runtime C++ interface.

Inference on TensorRT backend

The pipeline for TensorRT deployment is also very easy to use.

import torch
from yolort.runtime import PredictorTRT

# Load the serialized TensorRT engine
engine_path = "yolov5n6.engine"
device = torch.device("cuda")
y_runtime = PredictorTRT(engine_path, device=device)

# Perform inference on an image file
predictions = y_runtime.predict("bus.jpg")

Besides, we provide a tutorial detailing yolort's model conversion to TensorRT and the use of the Python interface. Please check this example if you want to use the C++ interface.

🎨 Model Graph Visualization

Now, yolort can draw the model graph directly, checkout our tutorial to see how to use and visualize the model graph.

👋 Contributing

We love your input! Please see our Contributing Guide to get started and for how to help out. Thank you to all our contributors! If you like this project please consider ⭐ this repo, as it is the simplest way to support us.

📖 Citing yolort

If you use yolort in your publication, please cite it by using the following BibTeX entry.

@Misc{yolort2021,
  author =       {Zhiqiang Wang and Song Lin and Shiquan Yu and Wei Zeng and Fidan Kharrasov},
  title =        {YOLORT: A runtime stack for object detection on specialized accelerators},
  howpublished = {\url{https://github.com/zhiqwang/yolort}},
  year =         {2021}
}

🎓 Acknowledgement

The implementation of yolov5 borrow the code from ultralytics.
This repo borrows the architecture design and part of the code from torchvision.

yolort's People

Contributors

Stargazers

Watchers

Forkers

yolort's Issues

Error on converting a custom yolov5s to yolort

I'm getting this error when converting a customized yolov5 to yolort:

Traceback (most recent call last):
  File "convert_ultralytics_to_yolort.py", line 31, in <module>
    num_classes=80)
  File "/yolov5/yolort/utils/update_module_state.py", line 74, in update_module_state_from_ultralytics
    module_state_updater.updating(model)
  File "/yolov5/yolort/utils/update_module_state.py", line 111, in updating
    self.attach_parameters_block(state_dict, name, None))
  File "/yolov5/yolort/utils/update_module_state.py", line 147, in attach_parameters_block
    return rgetattr(state_dict[ind], keys[1:])
  File "/yolov5/yolort/utils/update_module_state.py", line 162, in rgetattr
    return reduce(_getattr, [obj] + attr)
  File "/yolov5/yolort/utils/update_module_state.py", line 161, in _getattr
    return getattr(obj, attr, *args)
  File "/virtualenv/v1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1131, in __getattr__
    type(self).__name__, name))
AttributeError: 'Conv' object has no attribute 'bn'

how to inference on `torchscript`

how inference torchscript

Inconsistency with the latest version of yolov5

Seems that there is a bug caused by the incompatibility with the latest version of YOLOv5 after ultralytics/yolov5#4833 .

Discussed in #233

^{Originally posted by 123556666 November 22, 2021}

I got a RuntimeError when loading custom model trained with the latest version of YOLOv5.

RuntimeError: Error(s) in loading state_dict for Model:
	Missing key(s) in state_dict: "model.33.anchor_grid".

torchscripted model for training phase

Hi. Is it possible to use TorchScripted yolort model for the training phase instead of inference?

Yolox model not supported

while loading exported yolox model torchscript model i am getting this error
NotImplementedError: Currently doesn't support architecture with depth: 1.33 and 1.25, feel free to create a ticket labeled enhancement to us

Could you please add support for larger models like yolox

Add Chinese Docs

📚 Documentation

This is of course friendly to the Chinese.

Replace `print` with `logger`

🚀 Feature

Replace print statements with Python logging library.

Unable to run ./yolo_inference on GPU

Hi, thanks for putting this repo together. I am working with it due to trying to infer my yolov5 model in c++ with pre and post processing on the GPU as I mentioned here.

I converted my model from yolov5 to yolov5-rt-stack and it seemed to work without issue, but I was having issues trying to run it. Before diving into that issue too deeply, I decided to try and run your sample code first to see if that worked.

I followed your README and I was able to run inference via CPU without issue. However, when I try to run using the --gpu flag, I get the following error:

Click to display error

root@pc:/home/user/git/yolov5-rt-stack/deployment/build# ./yolo_inference --input_source /path/to/dog.jpg --checkpoint ../../test/tracing/yolov5s.torchscript.pt --labelmap ../../notebooks/assets/coco.names --gpu 
>>> Set GPU mode
>>> Loading model
>>> Model loaded
>>> Run once on empty image
[W TensorImpl.h:1153] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
terminate called after throwing an instance of 'c10::NotImplementedError'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/yolort/models/yolo_module.py", line 31, in forward
    inputs: List[Tensor],
    targets: Optional[List[Dict[str, Tensor]]]=None) -> Tuple[Dict[str, Tensor], List[Dict[str, Tensor]]]:
    _0 = (self)._forward_impl(inputs, targets, )
          ~~~~~~~~~~~~~~~~~~~ <--- HERE
    return _0
  def _forward_impl(self: __torch__.yolort.models.yolo_module.YOLOModule,
  File "code/__torch__/yolort/models/yolo_module.py", line 51, in _forward_impl
    _4 = (self.transform).forward(inputs, targets, )
    samples, targets0, = _4
    outputs = (self.model).forward(samples.tensors, targets0, )
               ~~~~~~~~~~~~~~~~~~~ <--- HERE
    losses = annotate(Dict[str, Tensor], {})
    detections = annotate(List[Dict[str, Tensor]], [])
  File "code/__torch__/yolort/models/box_head.py", line 293, in forward
      _105 = annotate(List[Optional[Tensor]], [inds, labels])
      scores0 = torch.index(scores, _105)
      keep = _92(boxes0, scores0, labels, self.nms_thresh, )
             ~~~ <--- HERE
      keep0 = torch.slice(keep, 0, None, self.detections_per_img)
      _106 = annotate(List[Optional[Tensor]], [keep0])
  File "code/__torch__/torchvision/ops/boxes.py", line 16, in batched_nms
    _5 = torch.unsqueeze(torch.slice(offsets), 1)
    boxes_for_nms = torch.add(boxes, _5)
    keep = __torch__.torchvision.ops.boxes.nms(boxes_for_nms, scores, iou_threshold, )
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _0 = keep
  return _0
  File "code/__torch__/torchvision/ops/boxes.py", line 87, in nms
  _16 = __torch__.torchvision.extension._assert_has_ops
  _17 = _16()
  _18 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _18

Traceback of TorchScript, original code (most recent call last):
  File "/home/user/git/yolov5-rt-stack/yolort/models/yolo_module.py", line 137, in forward
        ``training_step``). We keep ``targets`` here for Backward Compatible.
        """
        return self._forward_impl(inputs, targets)
               ~~~~~~~~~~~~~~~~~~ <--- HERE
  File "/home/user/git/yolov5-rt-stack/yolort/models/box_head.py", line 376, in forward
    
            # non-maximum suppression, independently done per level
            keep = batched_nms(boxes, scores, labels, self.nms_thresh)
                   ~~~~~~~~~~~ <--- HERE
            # keep only topk scoring head_outputs
            keep = keep[:self.detections_per_img]
  File "/usr/local/lib/python3.8/dist-packages/torchvision-0.8.0a0+2f40a48-py3.8-linux-x86_64.egg/torchvision/ops/boxes.py", line 42, in nms
    """
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /resources/vision/torchvision/csrc/vision.cpp:59 [kernel]
BackendSelect: fallthrough registered at /resources/pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /resources/pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
ADInplaceOrView: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:60 [backend fallback]
AutogradOther: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
UNKNOWN_TENSOR_TYPE_ID: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradMLC: fallthrough registered at /resources/pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: fallthrough registered at /resources/pytorch/torch/csrc/jit/frontend/tracer.cpp:1036 [backend fallback]
Autocast: fallthrough registered at /resources/pytorch/aten/src/ATen/autocast_mode.cpp:255 [backend fallback]
Batched: registered at /resources/pytorch/aten/src/ATen/BatchingRegistrations.cpp:1019 [backend fallback]
VmapMode: fallthrough registered at /resources/pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]


Exception raised from reportError at /resources/pytorch/aten/src/ATen/core/dispatch/OperatorEntry.cpp:392 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6c (0x7fd3681ff7ac in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x9d73bf (0x7fd35aed33bf in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: <unknown function> + 0x1047ef6 (0x7fd35b543ef6 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x4206df7 (0x7fd35e702df7 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x4005af6 (0x7fd35e501af6 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x30 (0x7fd35e4f1480 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x3fec2ee (0x7fd35e4e82ee in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::jit::GraphFunction::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x3e (0x7fd35e242a7e in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) + 0x168 (0x7fd35e253198 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x6afd7 (0x5599ef5a0fd7 in ./yolo_inference)
frame #10: <unknown function> + 0x5caf5 (0x5599ef592af5 in ./yolo_inference)
frame #11: __libc_start_main + 0xf3 (0x7fd3165060b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #12: <unknown function> + 0x59a9e (0x5599ef58fa9e in ./yolo_inference)

Aborted (core dumped)

I think the main thing to note in that error log is the following:

RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /resources/vision/torchvision/csrc/vision.cpp:59 [kernel]

My takeaway from that is either I am building TorchVision for CPU and not CUDA... or torchvision::nms does not support CUDA?

Click to show my environment:

root@pc:/home/user/git/yolov5-rt-stack# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.8.0a0+56b43f4
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080

Nvidia driver version: 460.84
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] pytorch-lightning==1.3.8
[pip3] torch==1.8.0a0+56b43f4
[pip3] torchmetrics==0.4.1
[pip3] torchvision==0.8.0a0+45f960c
[conda] Could not collect

I installed TorchVision via your instructions listed under number 2 here. I've tried checking out release/0.8.0, v0.8.1, and v0.8.2 all with the same issue. I've also tried v0.9.0 and v0.10.0 but your build instructions do not work for them so I ignored them for the time being.

Also worth noting there are two dependencies that I don't meet:

Me
- CUDA 11.2
- Ubuntu 20.04
You
- CUDA 10.2
- Ubuntu 18.04

Similar issues that I've found:
pytorch/vision#3058
WongKinYiu/PyTorch_YOLOv4#169

Any thoughts or ideas? Does the --gpu flag work for you?

Thanks,

Matt

Edited by @zhiqwang , updating some links in deployment.

Train models from scratch

🚀 Feature

Support training models from scratch, this is a follow-up issue of #16.

Motivation

Test whether the trainer mechanism works.

Pitch

Update libtorch interfaces of TorchVision to 0.9.0+

TorchVision has updated the C++ interface in pytorch/vision#3146, so the codes in development \ tree:82d6afb only work with PyTorch 1.7.x and TorchVision 0.8.x. The use of libtorch with TorchVision 0.8.x and TorchVision 0.9.x is different, and TorchVision has refactored these C++ interfaces half a year ago, our plan is to update the libtorch interface to PyTorch 1.8.0+ and TorchVision 0.9.0+ in the next release (v0.4.1) for better maintainability.

Originally posted by @zhiqwang in #132 (comment)

`SPPF` will generate nodes with duplicate names

🐛 Describe the bug

Somewhere between these two commits there was a model backbone change: 06022fd...e3e18f2. The three MaxPool2d at backbone.body.8.m.0, backbone.body.8.m.1, backbone.body.8.m.2 go from being parallel to being serialized in the later hash into just backbone.body.8.m with three outputs. (I'm using nni==2.4 to prune the model, and a node with three outputs is a problem for that)

For example, the SMALL model.
On the left: commit hash 06022fd, default upstream_version = r4.0
On the right: commit hash e3e18f2, default value for upstream_version changed with the addition of r6.0, so I set the upstream_version=r4.0 explicitly.

I see the same behavior for 'yolov5s', 'yolov5m', 'yolov5l'

All else in the models remained the same, therefore I wondered if this was accidental.

Versions

Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.14.4
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  2 2021, 10:49:15)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-4.14.252-195.483.amzn2.x86_64-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.142
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 450.142.00
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] efficientnet-pytorch==0.6.3
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.19.5
[pip3] numpydoc==1.1.0
[pip3] pytorch-lightning==1.5.2
[pip3] pytorchcv==0.0.58
[pip3] segmentation-models-pytorch==0.2.1
[pip3] torch==1.7.1
[pip3] torchinfo==1.5.3
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.8.2
[conda] Could not collect

Add ONNXRuntime Python interface

🚀 The feature

Add ONNXRuntime Python interface.

Motivation, pitch

My current plan is to put the core code for ORT inference into the directory yolort/runtime. And furthermore we could name the python scripts as ort_modeling.py, you can also use others if you have a better suggestion. We plan to add more python inferencing backends in the future in this directory. And then add a CLI tool named run_model.py (just for example) to the directory tools for use cases.

cc @itsnine

Originally posted by @zhiqwang in #176 (comment)

cannot export custom yolov5 .pt model

Hello @zhiqwang . Thank for awesome work.
i am successful with exporting yolov5s_pretrained.pt to onnx model. But i cannot export my custom yolov5l.pt model.
How can you export your custom model?

`cxxopts` will cause problems in several scenarios

🐛 Bug

When I run the code from https://github.com/yasenh/libtorch-yolov5.git. the program memory is still increasing. After debugging, I find the key is that cxxopts conflict with libtorch.

When I use cxxopts in my main.cpp and load torchscript by torch::jit::load, the bug sometime appears. The program memory is still increasing quickly and finally the program was killed by the operater system, and then there are some infomation like:

Process finished with exit code 9

Then I didn't use cxxopts and use gflags, The bug disappears.

So I think there are some conflicts between libtorch and cxxopts

Originally posted by @liubamboo in yasenh/libtorch-yolov5#28 (comment)

Pitch

Let's replace to a better arguments parser method.

Add visualizer object

🚀 The feature

Add a visualizer object for yolort.

Motivation, pitch

To accelerate development progress!

Alternatives

https://github.com/zhiqwang/yolov5-rt-stack/blob/730d8f152e2b57495a04ed53588aab22e5b76028/yolort/utils/image_utils.py#L312-L326

Additional

Enable prediction and fine-tuning

🚀 Feature

Use the strategy like lightning-flash to predict and support fine-tuning the models.

Motivation

A way to make the current trainer clean and easy to use.

Pitch

Does the tvm based CPU get accelerated?

Hi, just wonder tvm based deployed model can get accelerated or not?
Compare with vanilla pytorch CPU or onnxruntime or OpenVINO?

Pretrained model of the DarkNet Backbone

🚀 Feature

Now a pre-trained model is missing under the module DarkNet as following

https://github.com/zhiqwang/yolov5-rt-stack/blob/3171d65d2f30397b1811d755b5c923bfbc530d10/models/darknet.py#L46-L56

The best way is to train a pre-trained model from the ImageNet datasets, but its cost is relatively high. Another alternative is to directly cut the backbone branch of the detection model like

https://github.com/zhiqwang/yolov5-rt-stack/blob/3171d65d2f30397b1811d755b5c923bfbc530d10/utils/updated_checkpoint.py#L8-L16

Motivation

The pre-trained models in backbone are very helpful for accelerating convergence, but some studies have found that the accuracy obtained from scratch training will be better (supplement literature needed).

yolov5 to onnx model with nms

Hello @zhiqwang
I wonder that did you finish exporting the yolov5 with nms to onnx model yet ?
I dont see any PR in yolov5-ultralystic.

Make the anchor configuration mechanism more adaptable.

🐛 Bug

It seems that yolort only supports anchors of length 3. For example, it will raise a RuntimeError if we directly call the anchors of length 4 that VOC uses by default. Maybe we should let the anchor configuration mechanism of yolort be more adaptable.

To Reproduce (REQUIRED)

from yolort.models import YOLOv5

# Downloaded from https://github.com/ultralytics/yolov5/releases/download/v5.0/yolov5s-VOC.pt
ckpt_path_voc = 'yolov5s-VOC.pt'

model = YOLOv5.load_from_yolov5(ckpt_path_voc, score_thresh=0.25, version="r4.0")
model.eval()

img_path = 'test/assets/bus.jpg'
predictions = model.predict(img_path)

Reduce dependencies for inferencing only

🚀 The feature

In version 0.5.x we have full support for YOLOv5, at the cost of introducing more dependencies. If someone just wants to use the models for inference, he/she doesn't need all additional features like:

#355
profiling
type conversions to pandas

However, those useful features are (partially) built-in to the model architecture. Therefore he/she needs to install additional libraries like

matplotlib
seaborn
pandas

even if he/she doesn't use them at all. Also downloading the additional font (Arial.ttf) can be a problem in restricted environments.

I think it would be nice to have an operable core model that runs with only a subset of required dependencies.

Motivation, pitch

I'd suggest to disentangle the core model (inference only) and the additional features (plotting, profiling, ...).

Alternatives

Leave it as it is and require all dependencies

Additional context

Originally posted by @maxstrobel in ultralytics/yolov5#4664 (comment)

Support dynamic batch inference with onnx/onnxruntime

🚀 Feature

Support dynamic batch inference with onnx/onnxruntime.

Motivation

As @makaveli10 pointed out in #39 (comment), the current implementation of onnx/onnxruntime mechanism only supports dynamic shapes inference, not dynamic batch size.

I didn't know how to implement the dynamic batch inference, any help is welcome here.

[RFC] Support training with COCO and VOC datasets

🚀 Feature

Support training with COCO and VOC datasets

Motivation

Although ultralytics/yolov5 support training and its results is awesome, I decide to natively support training with COCO and VOC datasets in this yolov5-rt-stack (aka yolort).

Pitch

Fix dataloader of COCO format
Fix dataloader of VOC format
Fix loss computation

Report "std::bad_alloc" when I load the model.

The problem happen in main.cpp

https://github.com/zhiqwang/yolov5-rt-stack/blob/82d6afbfc6286ec4e01c9a75730b4c9b9c90fd86/deployment/src/main.cpp#L135

I get an error

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

Edited by @zhiqwang for code formatting

Create standalone application without python, c++ only, on windows 10

os : win10 64bits
libtorch: 1.7.1 cuda10.2

Could I create standalone application by this project? According to the deployment tutorial, I need to build the torchvision v0.8, else the model cannot be loaded. According to this post, Using torchvision C++ API without installing python, we still need to install python on the local machine if we want to use torchvision from c++. Thanks

Refactor darknet backbone

🚀 Feature

Refactor darknet backbone as separate modules

Motivation

To make the model structure more controllable, I decided to abandon the way of loading model with a yaml config files as Ultralytics and refactor the darknet backbone as separate modules.

Especially, the parse_model() as below containing the DarkNet, BackboneWithPAN, and PathAggregationNetwork modules, so first I will make this modules more visible and clear.

https://github.com/zhiqwang/yolov5-rt-stack/blob/9104887c4cf10061585eb5b8982270790437013d/models/backbone.py#L79

Then, this modifications would change the keys of checkpoint in state_dict, it's also necessary to revise the checkpiont update scripts.

Pitch

Refactor DarkNet
Refactor BackboneWithPAN
Refactor PathAggregationNetwork
Modify checkpoint update scripts

Alternatives

I found several works on this, like:

Additional context

Also follow the philosophy of torchvision especially for:

Add pylint unittest

🚀 Feature

Add flake8 pylint python unittest

Motivation

Make the codes more robust for maintenance.

Error loading custom yolov5-rt-stack model in C++ converted from yolov5_v4.0 - file not found: archive/constants.pkl

🐛 Bug

After training a yolov5-v4.0 model, I then took its best.pt weights and converted them to yolov5-rt-stack weights via python in update_module_state_from_ultralytics(). I then took these yolov5-rt-stack weights and passed them as an argument to the stock ./yolo_inference program.
Is that the correct procedure?

I get the following error:

root@pc:~yolov5-rt-stack/deployment/libtorch/build# ./yolo_inference --input_source path/to/jpg --checkpoint path/to/yolov5-rt-stack-yolov5_v4-model.pt --labelmap path/to/names 
>>> Set CPU mode
>>> Loading model
>>> Error loading the model: [enforce fail at inline_container.cc:222] . file not found: archive/constants.pkl
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void const*) + 0x68 (0x7f235b0bda28 in /usr/local/lib/python3.8/dist-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::getRecordID(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xda (0x7f23506bb70a in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #2: caffe2::serialize::PyTorchStreamReader::getRecord(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x38 (0x7f23506bb768 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::jit::readArchiveAndTensors(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<std::function<c10::StrongTypePtr (c10::QualifiedName const&)> >, c10::optional<std::function<c10::intrusive_ptr<c10::ivalue::Object, c10::detail::intrusive_target_default_null_type<c10::ivalue::Object> > (c10::StrongTypePtr, c10::IValue)> >, c10::optional<c10::Device>, caffe2::serialize::PyTorchStreamReader&) + 0xab (0x7f2351d732db in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x3d01835 (0x7f2351d73835 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x3d04013 (0x7f2351d76013 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #6: torch::jit::load(std::shared_ptr<caffe2::serialize::ReadAdapterInterface>, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x1ab (0x7f2351d7710b in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #7: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0xc2 (0x7f2351d78dd2 in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #8: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x6a (0x7f2351d78eba in /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x24709 (0x562e7d327709 in ./yolo_inference)
frame #10: __libc_start_main + 0xf3 (0x7f230cc060b3 in /usr/lib/x86_64-linux-gnu/libc.so.6)
frame #11: <unknown function> + 0x2315e (0x562e7d32615e in ./yolo_inference)

I do not have any errors and everything runs correctly when using yolov5-rt-stack/test/tracing/yolov5s.torchscript.pt weights with ./yolo_inference, so I believe I have everything installed, compiling, and running correctly.

Expected behavior

./yolo_inference runs correctly, finishes, and outputs detections.

Environment

Click to display environment

root@pc:~# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.8.0a0+56b43f4
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080

Nvidia driver version: 460.84
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] pytorch-lightning==1.3.8
[pip3] torch==1.8.0a0+56b43f4
[pip3] torchmetrics==0.4.1
[pip3] torchvision==0.9.0a0+01dfa8e
[conda] Could not collect

PyTorch / torchvision Version (e.g., 1.0 / 0.4.0): 1.8.0 / 0.9.0
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch / torchvision (conda, pip, source): source / source
Python version: 3.8
CUDA/cuDNN version: 11.2
GPU models and configuration: 3x GeForce GTX 1080

Additional context

I trained my yolov5_v4.0 model with the following command (on 4x Tesla V100's):
train.py --local_rank=0 --img 512 --batch 128 --epochs 500 --data path/to/yaml --weights --cfg models/yolov5s.yaml --device 0,1,2,3 --name experiment-2

[RFC] Support inference with TVM

🚀 Feature

Now, torchvision faster-rcnn and mask-rcnn can be deployed with Relay VM, as we can understand that faster-rcnn and yolov5rt share the same operators. It could be deployed with minor changes.

Additional context

https://discuss.tvm.apache.org/t/error-from-compiling-and-running-retinanet-from-torchvision/8678

Add export friendly institutions of nn.SiLU

To support exporting to ONNX and TVM of the upstream YOLOv5 4.0 releases.

Originally posted by @priya-dwivedi in #49 (comment)

cannot import name 'batched_nms'

🐛 Bug

Following is the error message when I use from yolort.models import yolov5s

File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/box_head.py", line 6, in <module>
  from torchvision.ops import batched_nms
ImportError: cannot import name 'batched_nms'

To Reproduce (REQUIRED)

Installation:
pip3 install -e .

Run:

python3
Python 3.6.9 (default, Jul 17 2020, 12:50:27) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from yolort.models import yolov5s

Below is the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/__init__.py", line 3, in <module>
    from yolort import models
  File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/__init__.py", line 5, in <module>
    from .yolo_module import YOLOModule
  File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/yolo_module.py", line 9, in <module>
    from . import yolo
  File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/yolo.py", line 12, in <module>
    from .box_head import YoloHead, SetCriterion, PostProcess
  File "/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/box_head.py", line 6, in <module>
    from torchvision.ops import batched_nms
ImportError: cannot import name 'batched_nms'

Environment

PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.3 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.18.20200828-gc268e26

Python version: 3.6 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: Quadro P2000

Nvidia driver version: 440.33.01
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] pytorch-lightning==1.1.8
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0
[conda] Could not collect

Kindly guide for the same.

What is the performance gain after optimizations?

❓ Questions and Help

What is the performance gain after optimization with this method?
Is there any breakup based on different yolov5 models like s,m,l,x and based on different batch sizes?

Additional context

It would be helpful to get a rough idea of performance gains on a given GPU. This can also be useful to choose different optimization frameworks like libtorch, torchserve, tensorRT etc.

Thanks for this awesome repo.

Support ultralytics released v4.0 stacks

🚀 Feature

ultralytics has already released the v4.0 model checkpoints here, it's time to support this version.

Pitch

I've reserved interface to support the v4.0 version, such as

https://github.com/zhiqwang/yolov5-rt-stack/blob/f34b032ea3755f62a3fe7949a613043ebbed5586/models/darknet.py#L69-L70

and

https://github.com/zhiqwang/yolov5-rt-stack/blob/f34b032ea3755f62a3fe7949a613043ebbed5586/models/path_aggregation_network.py#L52-L53

But there are other parts like the activation function that need to be compatible.

[RFC] Migrate the trainer to Lightning

🚀 Feature

Migrate current training engine to Lightning.

Motivation

It seems that the pytorch-lightning is more convenient for training and more robust for maintenance.

Support YOLOv5 P6 models

🐛 Bug

torch.testing.assert_all_close() fails for yolov5-v5.0 models.

Traceback (most recent call last):
  File "convert_ultralytics_to_rt-stack.py", line 122, in <module>
    torch.testing.assert_allclose(
  File "/usr/local/lib/python3.8/dist-packages/torch/testing/__init__.py", line 215, in assert_allclose
    raise AssertionError("expected tensor shape {0} doesn't match with actual tensor "
AssertionError: expected tensor shape torch.Size([3]) doesn't match with actual tensor shape torch.Size([4])!

However, it does succeed for LibTorch C++ inference! So I'd consider this low priority, but wanted to make a note of it.

This might have something to do with the fact that v5.0 of yolov5 adds support for 4 output layers. However, this is the 3 output layer version (YOLOv5-P5 aka yolov5s.pt) vs the 4 output layer version (YOLOv5-P6 aka yolov5s6.pt).

torch.testing.assert_all_close() succeeds for yolov5-v4.0 models as well as LibTorch C++ inference.

To Reproduce

Follow the how-to-align-with-ultralytics-yolov5.ipynb notebook to the "Varify the detection results between yolort and ultralytics" section and use a yolov5s-v5.0.pt file for weights.

Expected behavior

The assert completes without error, as it does for yolov5-v4.0.pt weights.

Environment

Click to display environment

root@pc:~# python3 -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.8.0a0+56b43f4
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080

Nvidia driver version: 460.84
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] pytorch-lightning==1.3.8
[pip3] torch==1.8.0a0+56b43f4
[pip3] torchmetrics==0.4.1
[pip3] torchvision==0.9.0a0+01dfa8e
[conda] Could not collect

PyTorch / torchvision Version (e.g., 1.0 / 0.4.0): 1.8.0 / 0.9.0
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch / torchvision (conda, pip, source): source / source
Python version: 3.8
CUDA/cuDNN version: 11.2
GPU models and configuration: 3x GeForce GTX 1080

A batch without any targets causes the script to fail

🐛 Bug

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

Default training, but use a dataset that has images without any bounding boxes.

Expected behavior

Do not throw any error.

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun 2 2021, 10:49:15) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 465.19.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.0
[pip3] pytorch-lightning==1.3.0rc1
[pip3] torch==1.9.0+cu111
[pip3] torch2trt-unofficial==0.0.3
[pip3] torchaudio==0.9.0
[pip3] torchfile==0.1.0
[pip3] torchmetrics==0.4.0
[pip3] torchvision==0.10.0+cu111
[pip3] torchviz==0.0.2

Additional context

The error is raised in line 106 of models/transform.py.
Because torch.cat can't process an empty list.

How can I rename the input and output layer name of the torchscript model?

❓ Questions and Help

I used the export.py get the torchscript model for yolov5s. But the input layer name is x and can't be load by Triton. So I want modify the name of input layer. Thanks!

Add ONNXRuntime C++ interfaces for inferencing

🚀 Feature

Add ONNXRuntime C++ interfaces for inferencing.

Roadmap of yolort

🚀 The feature

Support exporting yolort to more inference backends.

Motivation, pitch

yolort supports both exporting an end-to-end torchscript and onnx graphs.

Seems that torchscript is attracting more and more attention in industrial production environment. For example, we can find that some edge inference libraries such as coreML, ncnn, tflite, TVM, TensorRT and MNN have added or are adding support for exporting torchscript to their own IR models.

Add a dataloader as ultralytics in detection pipeline

The outputs are great, although not the same as yolov5, maybe some pre-processing/post-processing steps are different.

That's a great catch! I think it is caused by the different pre-processing operations. I've verified the the post-processing stages before, it can get the same results as ultralytics/yolov5 (when w/o TTA predict). And I've uploaded a notebook in order to verify the model inference and post-processing stages, this one is a bit outdated now, I haven’t had enough time to update it. And I plan to add a dataloader in the predict pipeline to make yolort detect the same results as ultralytics.

Looks like you forgot to convert color space from bgr to rgb in your "inference-pytorch-export-libtorch.ipynb"?

In my impression, ultralytics uses the BGR channel as default, but I am not very sure and need a double check. And it seems that the default image dataloader are using the RGB channel, if you input the image path to model, and use model.predict('image_path') to detect a image, it will be wrong, here also needs further verification.

BTW, all PRs are welcome here.

Originally posted by @stereomatchingkiss and @zhiqwang in #90 (comment)

Error when convert custom model

Thanks, I train the yolov5 model on kaggle(v4.0), copy out the best.pt, use the yolov5 repo to save the state_dict of the model, else the torch.load will throw error 'ModuleNotFoundError: No module named 'models''

import torch

from models.experimental import attempt_load

if __name__ == '__main__':
    model = attempt_load('best.pt', map_location='cpu')  # load FP32 model
    torch.save(model.state_dict(), 'bn.pt')

After that, convert the custom model by yolov5-rt-stack

import torch
from yolort.utils import update_module_state_from_ultralytics

# Update module state from ultralytics
model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0', custom_path_or_model = torch.load('./best.pt'), num_classes = 1)
# Save updated module
torch.save(model.state_dict(), 'yolov5s_updated.pt')

This time got error message

model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0', custom_path_or_model = torch.load('./bn.pt'), num_classes = 1)
 File "C:\Users\yyyy\programs\Qt\3rdLibs\pytorch_projects\yolov5-rt-stack\yolort\utils\update_module_state.py", line 61, in update_module_state_from_ultralytics
   model = torch.hub.load(f'ultralytics/yolov5:{version}', 'custom', path_or_model=custom_path_or_model)
 File "C:\Users\yyyy\Anaconda3\envs\ppyolo\lib\site-packages\torch\hub.py", line 370, in load
   model = _load_local(repo_or_dir, model, *args, **kwargs)
 File "C:\Users\yyyy\Anaconda3\envs\ppyolo\lib\site-packages\torch\hub.py", line 399, in _load_local
   model = entry(*args, **kwargs)
 File "C:\Users\yyyy/.cache\torch\hub\ultralytics_yolov5_v4.0\hubconf.py", line 123, in custom
   model = model['model']  # load model
KeyError: 'model'

Any clues how to solve this?Thanks

Conert inference results of c++ to bbox, class id and score

How could convert the results of

output.toTuple()->elements()[1];

to bbox, score and class_idx? The deployment/src/main.cpp do not mention how to do it. Thanks

Could not find any similar ops to torchvision::nms. This op may not exist or may not be currently supported in TorchScript.

🐛 Bug

When attempting to load the yolov5-rt-stack model (with NMS post-processing) in C++, the following error appears:

>>> Loading model
>>> Other error: 
Unknown builtin op: torchvision::nms.
Could not find any similar ops to torchvision::nms. This op may not exist or may not be currently supported in TorchScript.
:
  File "/usr/local/lib/python3.8/dist-packages/torchvision-0.10.0a0+300a8a4-py3.8-linux-x86_64.egg/torchvision/ops/boxes.py", line 35
    """
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
           ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 130
  _39 = __torch__.torchvision.extension._assert_has_ops
  _40 = _39()
  _41 = ops.torchvision.nms(boxes, scores, iou_threshold)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  return _41
'nms' is being compiled since it was called from '_batched_nms_vanilla'
  File "/usr/local/lib/python3.8/dist-packages/torchvision-0.10.0a0+300a8a4-py3.8-linux-x86_64.egg/torchvision/ops/boxes.py", line 102
    for class_id in torch.unique(idxs):
        curr_indices = torch.where(idxs == class_id)[0]
        curr_keep_indices = nms(boxes[curr_indices], scores[curr_indices], iou_threshold)
                            ~~~ <--- HERE
        keep_mask[curr_indices[curr_keep_indices]] = True
    keep_indices = torch.where(keep_mask)[0]
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 96
    _22 = torch.index(boxes, _21)
    _23 = annotate(List[Optional[Tensor]], [curr_indices])
    curr_keep_indices = __torch__.torchvision.ops.boxes.nms(_22, torch.index(scores, _23), iou_threshold, )
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _24 = annotate(List[Optional[Tensor]], [curr_keep_indices])
    _25 = torch.index(curr_indices, _24)
'_batched_nms_vanilla' is being compiled since it was called from 'batched_nms'
Serialized   File "code/__torch__/torchvision/ops/boxes.py", line 5
    idxs: Tensor,
    iou_threshold: float) -> Tensor:
  _0 = __torch__.torchvision.ops.boxes._batched_nms_vanilla
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _1 = __torch__.torchvision.ops.boxes._batched_nms_coordinate_trick
  if torch.gt(torch.numel(boxes), 4000):
'batched_nms' is being compiled since it was called from 'PostProcess.forward'
Serialized   File "code/__torch__/yolort/models/box_head.py", line 84
    head_outputs: List[Tensor],
    anchors_tuple: Tuple[Tensor, Tensor, Tensor]) -> List[Dict[str, Tensor]]:
    _11 = __torch__.torchvision.ops.boxes.batched_nms
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    batch_size, _12, _13, _14, K, = torch.size(head_outputs[0])
    all_pred_logits = annotate(List[Tensor], [])

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

I was unable to reproduce this with a MCVE, so it probably is an issue with my project. But I wanted to document it here in hopes of helping someone else.

Expected behavior

The model loads via torch::jit::load() without issue.

Environment

# python3 -m torch.utils.collect_env 
Collecting environment information...
PyTorch version: 1.9.0a0+gitd69c22d
Is debug build: False
CUDA used to build PyTorch: 11.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.1
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-80-generic-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.2.152
GPU models and configuration: 
GPU 0: GeForce GTX 1080
GPU 1: GeForce GTX 1080
GPU 2: GeForce GTX 1080

Nvidia driver version: 460.91.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.1.0
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.1.0
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.1
[pip3] pytorch-lightning==1.3.8
[pip3] torch==1.9.0a0+gitd69c22d
[pip3] torchmetrics==0.4.1
[pip3] torchvision==0.10.0a0+300a8a4
[conda] Could not collect

Mismatch between ultralytics v4.0 predictions and model created from update_module_state_from_ultralytics

🐛 Bug

Hi! :)

Im am currently training a model in ultralytics v4.0 and want to compile in to TVM with the intermediate step through yolov5 rt. I am comparing the outputs of the ultralytics v4.0 and the model created from update_module_state_from_ultralytics for the same image.

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

Load ultralytics model:

from models.experimental import attempt_load
from utils.general import check_img_size, non_max_suppression, scale_coords

model = attempt_load('.last.pt',map_location='cpu')  # load FP32 model
model.eval()

with torch.no_grad():
    pred = model(torch.from_numpy(img)[0]


# Apply NMS
conf_thres = 0.1
iou_thres = 0.45
pred = non_max_suppression(pred, conf_thres, iou_thres, agnostic=True)[0]
print (pred)
#[tensor([[2.11597e+02, 1.37264e+02, 3.22576e+02, 2.64709e+02, 2.02256e-01, 5.00000e+00],
        [2.87690e+02, 1.53040e+02, 3.20757e+02, 2.62483e+02, 1.43088e-01, 5.00000e+00],
        [2.90143e+02, 1.66800e+02, 3.20759e+02, 2.59762e+02, 1.10790e-01, 5.00000e+00]])]

Converted model

model = update_module_state_from_ultralytics(arch='yolov5s', version='v4.0',num_classes=10,custom_path_or_model='.last.pt',set_fp16=False)
model.eval()
with torch.no_grad():
    pred = model(torch.from_numpy(img))
print(pred )
#(tensor([[139.78711,  49.54448, 394.38562, 352.42865],
        [266.29352,  77.71037, 342.15277, 337.81256],
        [283.28748, 115.52697, 326.03149, 323.86658]]), tensor([0.20226, 0.14309, 0.06404]), tensor([5, 5, 5]))

Compare predictions

Somehow the class and scores seems to be the same (at least for the first two bboxes), but the bboxes itself have different coordinates. Do you have an idea where the mismatch lies? As far as i can see, ultralytics does not do any other post processing than the non_max_suppression. Do you have a test script where you tested and compared outputs of update_module_state_from_ultralytics?

Thanks!

Edited by @zhiqwang for code formatting

Add a batch dimension

I was trying to add a batch dimension to the onnx model and run inference on multiple images concurrently. while doing that faced this issue :

torch.onnx.export(
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/__init__.py", line 225, in export
return utils.export(model, args, f, export_params, verbose, training,
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/utils.py", line 85, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/utils.py", line 632, in _export
_model_to_graph(model, args, verbose, input_names,
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/utils.py", line 409, in _model_to_graph
graph, params, torch_out = _create_jit_graph(model, args,
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/utils.py", line 379, in _create_jit_graph
graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/onnx/utils.py", line 342, in _trace_and_get_graph_from_model
torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/jit/_trace.py", line 1148, in _get_trace_graph
outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/jit/_trace.py", line 125, in forward
graph, out = torch._C._create_graph_by_tracing(
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/jit/_trace.py", line 116, in wrapper
outs.append(self.inner(*trace_inputs))
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
  File "/home/gogetter/workspace-vineet/yolov5-rt-stack/models/yolo.py", line 132, in forward
images, targets = self.transform(images, targets)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/nn/modules/module.py", line 725, in _call_impl
result = self._slow_forward(*input, **kwargs)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torch/nn/modules/module.py", line 709, in _slow_forward
result = self.forward(*input, **kwargs)
  File "/home/gogetter/anaconda3/envs/yolov5_v31/lib/python3.8/site-packages/torchvision/models/detection/transform.py", line 102, in forward
raise ValueError("images is expected to be a list of 3d tensors "
ValueError: images is expected to be a list of 3d tensors of shape [C, H, W], got torch.Size([2, 3, 1536, 2688])

I have added a batch dimension preciously in onnx models by simple expanding the dimension of the input but not in this case. Let me know if you have faced this issue and have any pointers for me?

Exporting fp16 model to onnx produces invalid onnx model

🐛 Bug

When exporting a half precision (fp16) model to onnx it creates an invalid onnx file. This appears to be because of a node that remains in fp32 as a result of this line in torch.nn.functional.interpolate

To Reproduce (REQUIRED)

Steps to reproduce the behavior:

Open tutorial "export-onnx-inference-onnxruntime" notebook.
In the third code box, after model = model.to(device) add the line model = model.half()
Continue running notebook code. Warning below will occur at torch.onnx.export(...). Error will occur at onnx_model = onnx.load(export_onnx_name)

Relevant warnings on export appears to be:

/home/david/.conda/envs/pytorch/lib/python3.7/site-packages/torch/nn/functional.py:3123: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  dtype=torch.float32)).float())) for i in range(dim)]

Error on loading onnx model is:

Fail                                      Traceback (most recent call last)
<ipython-input-23-0c5db6c3f5a7> in <module>
      6     onnx_model,
      7     input_shapes={"images_tensors": [3, 640, 640]},
----> 8     dynamic_input_shape=True,
      9 )
     10 

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in simplify(model, check_n, perform_optimization, skip_fuse_bn, input_shapes, skipped_optimizers, skip_shape_inference, input_data, dynamic_input_shape, custom_lib)
    478         return model
    479 
--> 480     model = fixed_point(model, infer_shapes_and_optimize, constant_folding)
    481 
    482     # Overwrite model input shape

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in fixed_point(x, func_a, func_b)
    379     """
    380     x = func_a(x)
--> 381     x = func_b(x)
    382     while True:
    383         y = func_a(x)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in constant_folding(model)
    472                                        input_shapes=updated_input_shapes,
    473                                        input_data=input_data,
--> 474                                        custom_lib=custom_lib)
    475         const_nodes = clean_constant_nodes(const_nodes, res)
    476         model = eliminate_const_nodes(model, const_nodes, res)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in forward_for_node_outputs(model, nodes, input_shapes, input_data, custom_lib)
    227                   input_data=input_data,
    228                   input_shapes=input_shapes,
--> 229                   custom_lib=custom_lib)
    230     return res
    231 

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxsim/onnx_simplifier.py in forward(model, input_data, input_shapes, custom_lib)
    193     sess_options.log_severity_level = 3
    194     sess = rt.InferenceSession(model.SerializeToString(
--> 195     ), sess_options=sess_options, providers=['CPUExecutionProvider'])
    196 
    197     input_names = get_input_names(model)

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in __init__(self, path_or_bytes, sess_options, providers, provider_options)
    278 
    279         try:
--> 280             self._create_inference_session(providers, provider_options)
    281         except RuntimeError:
    282             if self._enable_fallback:

~/.conda/envs/pytorch/lib/python3.7/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py in _create_inference_session(self, providers, provider_options)
    307             sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)
    308         else:
--> 309             sess = C.InferenceSession(session_options, self._model_bytes, False, self._read_config_from_model)
    310 
    311         # initialize the C++ InferenceSession

Fail: [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) bound to different types (tensor(float) and tensor(float16) in node (Concat_929).

Expected behavior

Successful execution of tutorial notebook when model is converted to half precision.

Environment

[pip3] numpy==1.19.2
[pip3] pytorch-lightning==1.3.0rc1
[pip3] torch==1.7.1
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchmetrics==0.3.2
[pip3] torchvision==0.8.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py37he8ac12f_0
[conda] mkl_fft 1.3.0 py37h54f3939_0
[conda] mkl_random 1.1.1 py37h0573a6f_0
[conda] numpy 1.19.2 py37h54aff64_0
[conda] numpy-base 1.19.2 py37hfa32c7d_0
[conda] pytorch 1.7.1 py3.7_cuda10.2.89_cudnn7.6.5_0 pytorch
[conda] pytorch-lightning 1.3.0rc1 pypi_0 pypi
[conda] torchaudio 0.7.2 py37 pytorch
[conda] torchmetrics 0.3.2 pypi_0 pypi
[conda] torchvision 0.8.2 py37_cu102 pytorch

Additional context

It looks like a pytorch issue but I'm not sure how we are using this interpolate function. Perhaps we can find a workaround?

Support autoenvolved anchors as ultralytics

When converting the yolov5 custom model to yolort, the configuration of Anchors is fixed. This will cause a large gap between the two prediction results.

This is my test results for reference.

Detection results with ultralytics

tensor([[ 60.20518, 275.15906, 227.31180, 460.78314,   0.95166,   2.00000],
        [239.06186,  78.73691, 411.81058, 264.97150,   0.95022,   0.00000],
        [245.05939, 421.12323, 407.59259, 505.68793,   0.94337,   2.00000],
        [241.37396, 348.25543, 413.31641, 406.12311,   0.93951,   0.00000],
        [ 68.77753, 465.43747, 217.82889, 519.14197,   0.93248,   5.00000],
        [245.88387, 509.54541, 404.57898, 541.54272,   0.92447,   5.00000],
        [ 60.48768,  78.21830, 225.84097, 269.68933,   0.91452,   0.00000],
        [242.17107, 550.16064, 413.19885, 585.58118,   0.90575,   0.00000],
        [ 60.04885, 564.58539, 184.14281, 573.51935,   0.88148,   6.00000],
        [257.21490, 295.98309, 397.26947, 315.52979,   0.87154,   7.00000],
        [270.75903, 335.44797, 379.67609, 346.24161,   0.86142,   7.00000],
        [276.42148, 269.73453, 380.06650, 281.44186,   0.85676,   7.00000],
        [ 58.77311, 539.35565, 129.32153, 550.79681,   0.84660,   6.00000],
        [299.25656, 317.71469, 353.17801, 333.91336,   0.84569,   7.00000],
        [ 70.90355, 576.13129, 228.42677, 586.76593,   0.82126,   0.00000],
        [272.83978, 282.16409, 379.86005, 293.18692,   0.67504,   7.00000]])

Detection results with yolort

boxes:

tensor([[114.73997, 286.17834, 172.77701, 449.76385],
        [296.58661, 128.59970, 354.28583, 215.10870],
        [298.10165, 426.14325, 354.55032, 500.66791],
        [310.67633, -24.03447, 340.96069, 367.68439],
        [298.63022, 363.74905, 356.06015, 390.62949],
        [310.64130, 375.32629, 340.15778, 551.87183],
        [279.98190,  71.63794, 372.35941, 273.37885],
        [281.49362, 345.05667, 372.36020, 408.86185],
        [117.42003, 468.62558, 169.18639, 515.95392],
        [318.71570, 317.67377, 335.81323, 436.61597],
        [226.89404,  79.70534, 424.16278, 265.10114],
        [ 46.96851, 273.46609, 239.71344, 461.19077],
        [315.01773, 509.35077, 335.44510, 541.73737],
        [302.44864, 511.76874, 347.06296, 539.38751],
        [114.45029,  89.58469, 171.87836, 258.32294],
        [317.77310, 491.92383, 333.15158, 558.55872],
        [299.12274, 559.64423, 356.24719, 576.09760],
        [318.85751, 531.31708, 335.88858, 604.20856],
        [135.67125, 436.88367, 151.06697, 547.62537],
        [ 44.50909,  77.33535, 241.77551, 270.65427],
        [114.10909, 564.53101, 130.08257, 573.57373],
        [128.94606, 174.29877, 157.97704, 560.87738],
        [318.22821, 295.86420, 336.25616, 315.64868],
        [127.40814, -24.54239, 158.43588, 373.00723],
        [231.56476, 419.62057, 420.35730, 506.79044],
        [105.16776, 565.24457, 139.56566, 572.73053],
        [319.92688, 283.30414, 333.55621, 328.24921],
        [316.60138, 550.55896, 338.71069, 585.45984],
        [318.20761, 335.38226, 332.22751, 346.30725],
        [307.69409, 297.45999, 346.46869, 314.14407],
        [310.38986, 336.16382, 340.24734, 345.19263],
        [321.57333, 269.66330, 334.91464, 281.51309],
        [116.27255, 558.97729, 128.35686, 579.16809],
        [319.82718, 328.34348, 330.08755, 353.56308],
        [ 89.50679, 539.28607,  98.58784, 550.86639],
        [322.74689, 317.61615, 329.68768, 334.01190],
        [313.31296, 270.62665, 342.22092, 280.61163],
        [318.55316, 318.89578, 333.70508, 332.48337],
        [323.54080, 307.02875, 328.81540, 344.24860],
        [ 84.36114, 540.25403, 104.05352, 550.03149],
        [323.00452, 262.09003, 333.10120, 289.10217],
        [ 90.60516, 532.02325,  97.54469, 558.19513],
        [127.85805, 576.88733, 171.47224, 586.00989],
        [141.52585, 569.54669, 157.04645, 593.17780],
        [139.58984, 575.94946, 159.91479, 586.71265],
        [319.46204, 282.09702, 333.23779, 293.25394],
        [311.79651, 283.07559, 341.74689, 292.15579],
        [321.21085, 275.04596, 331.59256, 300.28229],
        [322.56943, 287.68039, 336.84097, 324.22037]])

scores:

tensor([0.95166, 0.95022, 0.94337, 0.94313, 0.93951, 0.93825, 0.93778, 0.93444, 0.93248, 0.93215, 0.92636, 0.92569, 0.92447, 0.92406, 0.91452, 0.90793, 0.90575, 0.90452, 0.89434, 0.89394, 0.88148, 0.8779
0, 0.87154, 0.86893, 0.86798, 0.86740, 0.86400, 0.86210, 0.86142, 0.86071, 0.85727, 0.85676, 0.85073, 0.85071,
        0.84660, 0.84569, 0.84113, 0.83803, 0.83582, 0.83308, 0.82758, 0.82343, 0.82126, 0.79421, 0.78466, 0.67504, 0.66505, 0.66226, 0.28443])

labels:

tensor([2, 0, 2, 0, 0, 2, 0, 0, 5, 0, 0, 2, 5, 5, 0, 5, 0, 0, 5, 0, 6, 2, 7, 0, 2, 6, 7, 0, 7, 7, 7, 7, 6, 7, 6, 7, 7, 7, 7, 6, 7, 6, 0, 0, 0, 7, 7, 7, 0])

I also converted the official model, and there is no difference between the two models. So I hope someone can provide the method to change the default configuration of Anchors.

The output of the object category detected by onnxruntime C++ is all 0

🐛 Describe the bug

onnxruntime C++检测到的物体输出类型全是person
相同yolov5s v6.0 onnx模型python demo推理正确

如下图：

Versions

main

Model inferencing issue with inference-pytorch-export-libtorch.pynb Notebook

🐛 Bug

Inference did not work with this notebook.

To Reproduce (REQUIRED)

Run:

# Perform inference on an image tensor
model_out = model.predict(img)

Error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-8-119eb7698a3c> in <module>
      1 # Perform inference on an image tensor
----> 2 model_out = model.predict(img)

/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
     24         def decorate_context(*args, **kwargs):
     25             with self.__class__():
---> 26                 return func(*args, **kwargs)
     27         return cast(F, decorate_context)
     28 

/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/yolo_module.py in predict(self, x, batch_idx, skip_collate_fn, dataloader_idx, data_pipeline)
    129         images, _ = batch if len(batch) == 2 and isinstance(batch, (list, tuple)) else (batch, None)
    130         images = [img.to(self.device) for img in images]
--> 131         predictions = self.forward(images)
    132         output = data_pipeline.uncollate_fn(predictions)  # TODO: pass batch and x
    133         return output

/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/yolo_module.py in forward(self, inputs, targets)
     78         samples, targets = self.transform(inputs, targets)
     79         # Compute the detections
---> 80         detections = self.model(samples.tensors, targets=targets)
     81         # Rescale coordinate
     82         detections = self.transform.postprocess(detections, samples.image_sizes, original_image_sizes)

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/yolo.py in forward(self, samples, targets)
    114         else:
    115             # compute the detections
--> 116             detections = self.post_process(head_outputs, anchors_tuple)
    117 
    118         if torch.jit.is_scripting():

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/data/sourabh/Play_n_n_Learn/CV_DL/yolov5_exp/yolov5-rt-stack/yolort/models/box_head.py in forward(self, head_outputs, anchors_tuple)
    375 
    376             # non-maximum suppression, independently done per level
--> 377             keep = batched_nms(boxes, scores, labels, self.nms_thresh)
    378             # keep only topk scoring head_outputs
    379             keep = keep[:self.detections_per_img]

/usr/local/lib/python3.6/dist-packages/torch/jit/_trace.py in wrapper(*args, **kwargs)
   1098         if not is_tracing():
   1099             # Not tracing, don't do anything
-> 1100             return fn(*args, **kwargs)
   1101 
   1102         compiled_fn = script(wrapper.__original_fn)  # type: ignore

/usr/local/lib/python3.6/dist-packages/torchvision/ops/boxes.py in batched_nms(boxes, scores, idxs, iou_threshold)
     86         offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
     87         boxes_for_nms = boxes + offsets[:, None]
---> 88         keep = nms(boxes_for_nms, scores, iou_threshold)
     89         return keep
     90 

/usr/local/lib/python3.6/dist-packages/torchvision/ops/boxes.py in nms(boxes, scores, iou_threshold)
     40     """
     41     _assert_has_ops()
---> 42     return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
     43 
     44 

RuntimeError: Could not run 'torchvision::nms' with arguments from the 'CUDA' backend. 'torchvision::nms' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /root/project/torchvision/csrc/vision.cpp:59 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradXLA: fallthrough registered at /pytorch/aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
Tracer: fallthrough registered at /pytorch/torch/csrc/jit/frontend/tracer.cpp:967 [backend fallback]
Autocast: fallthrough registered at /pytorch/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Environment

PyTorch version: 1.7.0
Is debug build: True
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.3 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.18.20200828-gc268e26

Python version: 3.6 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: Quadro P2000

Nvidia driver version: 440.33.01
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] pytorch-lightning==1.1.8
[pip3] torch==1.7.0
[pip3] torchvision==0.8.0
[conda] Could not collect

TypeError: string indices must be integers

I am trying to train a model using some custom data and keep running into this issue when in model.train() mode.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-49-125710d48032> in <module>
----> 1 out = model(f, t)

~/pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

yolov5-rt-stack/yolort/models/yolo_module.py in forward(self, inputs, targets)

yolov5-rt-stack/yolort/models/transform.py in postprocess(self, result, image_shapes, original_image_sizes)

TypeError: string indices must be integers

For context f is a list of tensors of shape [3, 257, 79] and t is a list of dicts that look like the following:

{
   'boxes': tensor([[ 6., 14., 21., 61.], [34., 15., 48., 57.]]),
   'labels': tensor([3, 3]),
   'id': tensor([0]),
   'area': tensor([705., 588.]),
   'iscrowd': tensor([0, 0])
}

I feel like this is likely a very obvious issue on my end, but I can't seem to figure it out.

zhiqwang / yolort Goto Github PK

yolort's Introduction

🤗 Introduction

🆕 What's New

🛠️ Usage

Installation and Inference Examples

Loading via torch.hub

Loading checkpoint from official yolov5

🚀 Deployment

Inference on LibTorch backend

Inference on ONNX Runtime backend

Inference on TensorRT backend

🎨 Model Graph Visualization

👋 Contributing

📖 Citing yolort

🎓 Acknowledgement

yolort's People

Contributors

Stargazers

Watchers

Forkers

yolort's Issues

Discussed in #233

📚 Documentation

🚀 Feature

🚀 Feature

Motivation

Pitch

🐛 Describe the bug

Versions

🚀 The feature

Motivation, pitch

🐛 Bug

Pitch

🚀 The feature

Motivation, pitch

Alternatives

Additional

🚀 Feature

Motivation

Pitch

🚀 Feature

Motivation

🐛 Bug

To Reproduce (REQUIRED)

🚀 The feature

Motivation, pitch

Alternatives

Additional context

🚀 Feature

Motivation

🚀 Feature

Motivation

Pitch

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

🚀 Feature

Motivation

🐛 Bug

Expected behavior

Environment

Additional context

🚀 Feature

Additional context

🐛 Bug

To Reproduce (REQUIRED)

Environment

❓ Questions and Help

Additional context

🚀 Feature

Pitch

🚀 Feature

Motivation

🐛 Bug

To Reproduce

Expected behavior

Environment

Loading via `torch.hub`