GithubHelp home page GithubHelp logo

shining-love / tensorrtx Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wang-xinyu/tensorrtx

0.0 0.0 0.0 1.13 MB

Implementation of popular deep learning networks with TensorRT network definition API

License: MIT License

CMake 1.89% C++ 89.39% Cuda 6.57% Python 2.16%

tensorrtx's Introduction

TensorRTx

TensorRTx aims to implement popular deep learning networks with tensorrt network definition APIs. As we know, tensorrt has builtin parsers, including caffeparser, uffparser, onnxparser, etc. But when we use these parsers, we often run into some "unsupported operations or layers" problems, especially some state-of-the-art models are using new type of layers.

So why don't we just skip all parsers? We just use TensorRT network definition APIs to build the whole network, it's not so complicated.

I wrote this project to get familiar with tensorrt API, and also to share and learn from the community.

All the models are implemented in pytorch or mxnet first, and export a weights file xxx.wts, and then use tensorrt to load weights, define network and do inference. Some pytorch implementations can be found in my repo Pytorchx, the remaining are from polular open-source implementations.

News

  • 29 Oct 2020. First INT8 quantization implementation! Please check retinaface.
  • 23 Oct 2020. Add a .wts model zoo for quick evaluation.
  • 8 Oct 2020. ChrystleMyrnaLobo added ssd(mobilenetv2).
  • 21 Sep 2020. BaofengZan added hrnet classification and step by step tutorial(Chinese).
  • 16 Sep 2020. hwh-hit added ufld(Ultra-Fast-Lane-Detection, ECCV2020).
  • 13 Sep 2020. Add crnn, and got 1000fps on GTX1080.
  • 28 Aug 2020. BaofengZan added a tutorial for compiling and running tensorrtx on windows.
  • 16 Aug 2020. upczww added a python wrapper for yolov5.
  • 14 Aug 2020. Update yolov5 to v3.0 release.
  • 3 Aug 2020. BaofengZan implemented yolov5 s/m/l/x (yolov5 v2.0 release).
  • 28 May 2020. arcface LResNet50E-IR model from deepinsight/insightface implemented. We got 333fps on GTX1080.
  • 22 May 2020. A new branch trt4 created, which is using TensorRT 4 API. Now the master branch is using TensorRT 7 API. But only yolov4 has been migrated to TensorRT 7 API for now. The rest will be migrated soon. And a tutorial for migarating from TensorRT 4 to 7 provided.

Tutorials

Test Environment

  1. GTX1080 / Ubuntu16.04 / cuda10.0 / cudnn7.6.5 / tensorrt7.0.0 / nvinfer7.0.0 / opencv3.3

How to run

Each folder has a readme inside, which explains how to run the models inside.

Models

Following models are implemented.

Name Description
lenet the simplest, as a "hello world" of this project
alexnet easy to implement, all layers are supported in tensorrt
googlenet GoogLeNet (Inception v1)
inception Inception v3
mnasnet MNASNet with depth multiplier of 0.5 from the paper
mobilenetv2 MobileNet V2
mobilenetv3 V3-small, V3-large.
resnet resnet-18, resnet-50 and resnext50-32x4d are implemented
senet se-resnet50
shufflenet ShuffleNetV2 with 0.5x output channels
squeezenet SqueezeNet 1.1 model
vgg VGG 11-layer model
yolov3-tiny weights and pytorch implementation from ultralytics/yolov3
yolov3 darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov3-spp darknet-53, weights and pytorch implementation from ultralytics/yolov3
yolov4 CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3
yolov5 yolov5-s/m/l/x v1.0 v2.0 v3.0, pytorch implementation from ultralytics/yolov5
retinaface resnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface
arcface LResNet50E-IR, weights from deepinsight/insightface
retinafaceAntiCov mobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
dbnet Scene Text Detection, weights from BaofengZan/DBNet.pytorch
crnn pytorch implementation from meijieru/crnn.pytorch
ufld pytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020
hrnet hrnet-image-classification, pytorch implementation from HRNet-Image-Classification
ssd ssd(mobilenetv2), pytorch implementation from qfgaohao/pytorch-ssd

Model Zoo

The .wts files can be downloaded from model zoo for quick evaluation. But it is recommanded to convert .wts from pytorch/mxnet model, so that you can retrain your own model.

BaiduPan pwd: uvv2

Tricky Operations

Some tricky operations encountered in these models, already solved, but might have better solutions.

Name Description
BatchNorm Implement by a scale layer, used in resnet, googlenet, mobilenet, etc.
MaxPool2d(ceil_mode=True) use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
average pool with padding use setAverageCountExcludesPadding() when necessary, see inception.
relu6 use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet.
torch.chunk() implement the 'chunk(2, dim=C)' by tensorrt plugin, see shufflenet.
channel shuffle use two shuffle layers to implement channel_shuffle, see shufflenet.
adaptive pool use fixed input dimension, and use regular average pooling, see shufflenet.
leaky relu I wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4.
yolo layer v1 yolo layer is implemented as a plugin, see yolov3 in branch trt4.
yolo layer v2 three yolo layers implemented in one plugin, see yolov3-spp.
upsample replaced by a deconvolution layer, see yolov3.
hsigmoid hard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
retinaface output decode implement a plugin to decode bbox, confidence and landmarks, see retinaface.
mish mish activation is implemented as a plugin, mish is used in yolov4
prelu mxnet's prelu activation with trainable gamma is implemented as a plugin, used in arcface
HardSwish hard_swish = x * hard_sigmoid, used in yolov5 v3.0
LSTM Implemented pytorch nn.LSTM() with tensorrt api

Speed Benchmark

Models Device BatchSize Mode Input Shape(HxW) FPS
YOLOv3-tiny Xeon E5-2620/GTX1080 1 FP32 608x608 333
YOLOv3(darknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 39.2
YOLOv3-spp(darknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 38.5
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 1 FP32 608x608 35.7
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 4 FP32 608x608 40.9
YOLOv4(CSPDarknet53) Xeon E5-2620/GTX1080 8 FP32 608x608 41.3
YOLOv5-s Xeon E5-2620/GTX1080 1 FP32 608x608 142
YOLOv5-s Xeon E5-2620/GTX1080 4 FP32 608x608 173
YOLOv5-s Xeon E5-2620/GTX1080 8 FP32 608x608 190
YOLOv5-m Xeon E5-2620/GTX1080 1 FP32 608x608 71
YOLOv5-l Xeon E5-2620/GTX1080 1 FP32 608x608 43
YOLOv5-x Xeon E5-2620/GTX1080 1 FP32 608x608 29
RetinaFace(resnet50) Xeon E5-2620/GTX1080 1 FP32 480x640 90
RetinaFace(resnet50) Xeon E5-2620/GTX1080 1 INT8 480x640 204
RetinaFace(mobilenet0.25) Xeon E5-2620/GTX1080 1 FP32 480x640 417
ArcFace(LResNet50E-IR) Xeon E5-2620/GTX1080 1 FP32 112x112 333
CRNN Xeon E5-2620/GTX1080 1 FP32 32x100 1000

Help wanted, if you got speed results, please add an issue or PR.

Acknowledgments & Contact

Currently, This repo is funded by Alleyes-THU AI Lab(aboutus in Chinese). We are based in Tsinghua University, Beijing, and seeking for talented interns for CV R&D. Contact me if you are interested.

Any contributions, questions and discussions are welcomed, contact me by following info.

E-mail: [email protected]

WeChat ID: wangxinyu0375 (可加我微信进tensorrtx交流群,备注:tensorrtx)

tensorrtx's People

Contributors

wang-xinyu avatar baofengzan avatar chufei1995 avatar qiuyunzhe avatar upczww avatar chrystlemyrnalobo avatar koenvandesande avatar cesarandreslopez avatar hwh-hit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.