SCRFD face detection TensorRT

This is an implementation of the SCRFD face detection with NVIDIA TensorRT C++ API.

This repo is based on the InsightFace and TensorRTX.

Export scrfd onnx

Clone the repository from https://github.com/deepinsight/insightface
In file detection/scrfd/mmdet/models/dense_heads/scrfd_head.py, replace these lines

batch_size = cls_score.shape[0]
cls_score = cls_score.permute(0, 2, 3, 1).reshape(batch_size, -1, self.cls_out_channels).sigmoid()
bbox_pred = bbox_pred.permute(0, 2, 3, 1).reshape(batch_size, -1, 4)
kps_pred = kps_pred.permute(0, 2, 3, 1).reshape(batch_size, -1, 10)

cls_score = cls_score.sigmoid()

Download ckpt of desired models from https://github.com/deepinsight/insightface/blob/master/detection/scrfd/README.md
Generate onnx file using scrfd2onnx.py, for example

python detection/scrfd/tools/scrfd2onnx.py detection/scrfd/configs/scrfd/scrfd_2.5g_bnkps.py <path_to_ckpt>

After this step, check the generated onnx file scrfd_2.5g_bnkps_shape640x640.onnx in detection/scrfd/onnx/

Create the trt engine and run inference

Clone the scrfd_tensorRT repository
Copy scrfd_2.5g_bnkps_shape640x640.onnx into models folder
Generate the engine

alias trtexec="/usr/src/tensorrt/bin/trtexec"
trtexec --onnx=scrfd_2.5g_bnkps_shape640x640.onnx \
        --saveEngine=scrfd_2.5g_bnkps_shape640x640.trt \
        --fp16

Verify the generated engine

polygraphy inspect model scrfd_2.5g_bnkps_shape640x640.trt --model-type=engine

Check whether the input and output names and shapes are correct

Binding Index: 0 (Input)  [Name: input.1]  | Shapes: min=(1, 3, 640, 640), opt=(1, 3, 640, 640), max=(1, 3, 640, 640)
Binding Index: 1 (Output) [Name: bbox_8]   | Shape: (1, 8, 80, 80)
Binding Index: 2 (Output) [Name: kps_8]    | Shape: (1, 20, 80, 80)
Binding Index: 3 (Output) [Name: score_8]  | Shape: (1, 2, 80, 80)
Binding Index: 4 (Output) [Name: bbox_16]  | Shape: (1, 8, 40, 40)
Binding Index: 5 (Output) [Name: kps_16]   | Shape: (1, 20, 40, 40)
Binding Index: 6 (Output) [Name: score_16] | Shape: (1, 2, 40, 40)
Binding Index: 7 (Output) [Name: bbox_32]  | Shape: (1, 8, 20, 20)
Binding Index: 8 (Output) [Name: kps_32]   | Shape: (1, 20, 20, 20)
Binding Index: 9 (Output) [Name: score_32] | Shape: (1, 2, 20, 20)

They should match with the defined names and shapes in scrfd_trt.h, as

const char* INPUT_BLOB_NAME = "input.1";

const char* OUTPUT_BBOX_8_BLOB_NAME = "bbox_8";
const char* OUTPUT_KPS_8_BLOB_NAME = "kps_8";
const char* OUTPUT_SCORE_8_BLOB_NAME = "score_8";

const char* OUTPUT_BBOX_16_BLOB_NAME = "bbox_16";
const char* OUTPUT_KPS_16_BLOB_NAME = "kps_16";
const char* OUTPUT_SCORE_16_BLOB_NAME = "score_16";

const char* OUTPUT_BBOX_32_BLOB_NAME = "bbox_32";
const char* OUTPUT_KPS_32_BLOB_NAME = "kps_32";
const char* OUTPUT_SCORE_32_BLOB_NAME = "score_32";

and

const int INPUT_H = 640;
const int INPUT_W = 640;
const int INPUT_SIZE = 3 * INPUT_W * INPUT_H;

const int OUTPUT_BBOX_8_SIZE = 8 * 80 * 80;
const int OUTPUT_KPS_8_SIZE = 20 * 80 * 80;
const int OUTPUT_SCORE_8_SIZE = 2 * 80 * 80;

const int OUTPUT_BBOX_16_SIZE = 8 * 40 * 40;
const int OUTPUT_KPS_16_SIZE = 20 * 40 * 40;
const int OUTPUT_SCORE_16_SIZE = 2 * 40 * 40;

const int OUTPUT_BBOX_32_SIZE = 8 * 20 * 20;
const int OUTPUT_KPS_32_SIZE = 20 * 20 * 20;
const int OUTPUT_SCORE_32_SIZE = 2 * 20 * 20;

Build

mkdir build
cd build
cmake ..
make -j4

Run inference by specify the engine and input image, for example

./build/scrfd models/scrfd_2.5g_bnkps_shape640x640.trt test_images/worlds-largest-selfie.jpg

Results

Sample output images from scrfd_2.5g_bnkps model

Fps measurement

Inference multiple times for measuring the fps, for example

./build/scrfd models/scrfd_2.5g_bnkps_shape640x640.trt test_images/worlds-largest-selfie.jpg 1000

phunghx / scrfd_facedetection_tensorrt Goto Github PK

scrfd_facedetection_tensorrt's Introduction

SCRFD face detection TensorRT

Export scrfd onnx

Create the trt engine and run inference

Results

Fps measurement

References

scrfd_facedetection_tensorrt's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs