GithubHelp home page GithubHelp logo

medeasolution / tkdnn-python Goto Github PK

View Code? Open in Web Editor NEW

This project forked from onyalcin/tkdnn

0.0 1.0 0.0 49.08 MB

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms

License: GNU General Public License v2.0

CMake 1.69% C++ 84.76% C 1.50% Cuda 10.01% Python 1.97% Dockerfile 0.06%

tkdnn-python's Introduction

tkDNN

tkDNN is a Deep Neural Network library built with cuDNN and TensorRT primitives. The main goal of this project is to exploit NVIDIA boards as much as possible to obtain the best inference performance. It does not allow training.

Dependencies

This branch works on every NVIDIA GPU that supports the dependencies:

  • CUDA 10.0
  • CUDNN 7.603
  • TENSORRT 6.01
  • OPENCV 3.4
  • yaml-cpp 0.5.2 (sudo apt install libyaml-cpp-dev)

About OpenCV

OpenCV is necessary to compile this repository. You will probably have it installed, if not follow the steps defined in ai-frame-manager repository.

When using OpenCV not compiled with contrib, comment the definition of OPENCV_CUDACONTRIBCONTRIB in include/tkDNN/DetectionNN.h. When commented, the preprocessing of the networks is computed on the CPU, otherwise on the GPU. In the latter case some milliseconds are saved in the end-to-end latency.

How to compile this repo

Build with cmake.

git clone https://github.com/medeasolution/tkDNN-python
cd tkDNN-python
mkdir build
cd build
cmake .. 
make -j8

Building an engine for inference

Check docs/build_engine.md

Demo

CHeck docs/demo.md

PYTHON

The most important files are:

  • demo/darknetTR.cpp and its headers. There are defined (as extern) the functions that will be called from Python.
  • darknetTR.py: Where the structure to use this C functions is defined.

Also, to run the object detection demo with python:

python darknetTR.py build/yolo4_fp16.rt --video=demo/yolo_test.mp4

FPS Results

Inference FPS of YOLOv4 with tkDNN, average of 1200 images with the same dimension as the input size, on

  • RTX 2080Ti (CUDA 10.2, TensorRT 7.0.0, Cudnn 7.6.5);
  • Xavier AGX, Jetpack 4.3 (CUDA 10.0, CUDNN 7.6.3, tensorrt 6.0.1 );
  • Tx2, Jetpack 4.2 (CUDA 10.0, CUDNN 7.3.1, tensorrt 5.0.6 );
  • Jetson Nano, Jetpack 4.4 (CUDA 10.2, CUDNN 8.0.0, tensorrt 7.1.0 ).
Platform Network FP32, B=1 FP32, B=4 FP16, B=1 FP16, B=4 INT8, B=1 INT8, B=4
RTX 2080Ti yolo4 320 118,59 237,31 207,81 443,32 262,37 530,93
RTX 2080Ti yolo4 416 104,81 162,86 169,06 293,78 206,93 353,26
RTX 2080Ti yolo4 512 92,98 132,43 140,36 215,17 165,35 254,96
RTX 2080Ti yolo4 608 63,77 81,53 111,39 152,89 127,79 184,72

MAP Results

Results for COCO val 2017 (5k images), on RTX 2080Ti, with conf threshold=0.001

CodaLab CodaLab CodaLab CodaLab tkDNN map tkDNN map
tkDNN tkDNN darknet darknet tkDNN tkDNN
MAP(0.5:0.95) AP50 MAP(0.5:0.95) AP50 MAP(0.5:0.95) AP50
Yolov3 (416x416) 0.381 0.675 0.380 0.675 0.372 0.663
yolov4 (416x416) 0.468 0.705 0.471 0.710 0.459 0.695
yolov3tiny (416x416) 0.096 0.202 0.096 0.201 0.093 0.198
yolov4tiny (416x416) 0.202 0.400 0.201 0.400 0.197 0.395
Cnet-dla34 (512x512) 0.366 0.543 - - 0.361 0.535
mv2SSD (512x512) 0.226 0.381 - - 0.223 0.378

Darknet Parser

tkDNN implement and easy parser for darknet cfg files, a network can be converted with tk::dnn::darknetParser:

// example of parsing yolo4
tk::dnn::Network *net = tk::dnn::darknetParser("yolov4.cfg", "yolov4/layers", "coco.names");
net->print();

All models from darknet are now parsed directly from cfg, you still need to export the weights with the descripted tools in the previus section.

Supported layers convolutional maxpool avgpool shortcut upsample route reorg region yolo
Supported activations relu leaky mish

Existing tests and supported networks

Test Name Network Dataset N Classes Input size Weights
yolo YOLO v21 COCO 2014 80 608x608 weights
yolo_224 YOLO v21 COCO 2014 80 224x224 weights
yolo_berkeley YOLO v21 BDD100K 10 416x736 weights
yolo_relu YOLO v2 (with ReLU, not Leaky)1 COCO 2014 80 416x416 weights
yolo_tiny YOLO v2 tiny1 COCO 2014 80 416x416 weights
yolo_voc YOLO v21 VOC 21 416x416 weights
yolo3 YOLO v32 COCO 2014 80 416x416 weights
yolo3_512 YOLO v32 COCO 2017 80 512x512 weights
yolo3_berkeley YOLO v32 BDD100K 10 320x544 weights
yolo3_coco4 YOLO v32 COCO 2014 4 416x416 weights
yolo3_flir YOLO v32 FREE FLIR 3 320x544 weights
yolo3_tiny YOLO v3 tiny2 COCO 2014 80 416x416 weights
yolo3_tiny512 YOLO v3 tiny2 COCO 2017 80 512x512 weights
dla34 Deep Leayer Aggreagtion (DLA) 343 COCO 2014 80 224x224 weights
dla34_cnet Centernet (DLA34 backend)4 COCO 2017 80 512x512 weights
mobilenetv2ssd Mobilnet v2 SSD Lite5 VOC 21 300x300 weights
mobilenetv2ssd512 Mobilnet v2 SSD Lite5 COCO 2017 81 512x512 weights
resnet101 Resnet 1016 COCO 2014 80 224x224 weights
resnet101_cnet Centernet (Resnet101 backend)4 COCO 2017 80 512x512 weights
csresnext50-panet-spp Cross Stage Partial Network 7 COCO 2014 80 416x416 weights
yolo4 Yolov4 8 COCO 2017 80 416x416 weights
yolo4_berkeley Yolov4 8 BDD100K 10 540x320 weights
yolo4tiny Yolov4 tiny COCO 2017 80 416x416 weights

References

  1. Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  2. Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).
  3. Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  4. Zhou, Xingyi, Dequan Wang, and Philipp Krähenbühl. "Objects as points." arXiv preprint arXiv:1904.07850 (2019).
  5. Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  6. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  7. Wang, Chien-Yao, et al. "CSPNet: A New Backbone that can Enhance Learning Capability of CNN." arXiv preprint arXiv: 1911.11929 (2019).
  8. Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. "YOLOv4: Optimal Speed and Accuracy of Object Detection." arXiv preprint arXiv:2004.10934 (2020).

tkdnn-python's People

Contributors

alessiolei94 avatar alexflorensa avatar ceccocats avatar fabiobagni avatar ioir123ju avatar mive93 avatar omaralvarez avatar onyalcin avatar rcavicchioli avatar sapienzadavide avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.