At first, thank you for the good work. Including the pre and post pr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

fps of Pelee on TX1 about pelee HOT 10 OPEN

robert-junwang commented on July 24, 2024

fps of Pelee on TX1

from pelee.

Comments (10)

Robert-JunWang commented on July 24, 2024 2

PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.

from pelee.

Robert-JunWang commented on July 24, 2024 1

I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100 FPS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.

from pelee.

lucheng07082221 commented on July 24, 2024

@xonobo How to tansfer pepee to tensorRT? Can you share your experience？

from pelee.

lqs19881030 commented on July 24, 2024

@Robert-JunWang can you tell me how about changes the architecture and it can run over 104fps。

from pelee.

Ghustwb commented on July 24, 2024

PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.

It is so cool!!

from pelee.

Ghustwb commented on July 24, 2024

@Robert-JunWang Hi
thanks for your work
I only got 48fps on TX2 + TensorRT3.0.4,it is slower than mobileNet-SSD(54fps ,group conv).
you can run 70+fps,can you share you experience?
And can you tell me how to changes the architecture to get over 104fps.
Thanks

from pelee.

Ghustwb commented on July 24, 2024

I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100+ PFS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.

Grouped conv has been optimized in cudnn7,the inference time of group conv depends on cudnn libraries.I think ,in the same cudnn version,whether use tensorRT3 or tensorRT4,the time cost is the same.
Yes,you are right.MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.MobileNet with Fp32 runs 50fps,FP16 runs 54fps,in my TX2.
Thanks for your reply,I will retry it

from pelee.

Robert-JunWang commented on July 24, 2024

I guess you did not use jetson_clocks.sh to maximum GPU and CPU clockspeeds. After setting, both Pelee and SSD+MobileNet run over 70 FPS in FP32 mode. Pelee runs slightly faster than SSD+MobileNet in FP32 mode and much faster in FP16 mode on my TX2 (TensorRT4).

from pelee.

xonobo commented on July 24, 2024

May I ask a more generic question about the TX deployments. As long as I know tensorRT misses some layers used in SSD like reshape, priorbox, detectionoutput. In your TX timing experiments how did you overcome this issue? Did you implemented your own plugin tensorRT layers for the missings or used some available code around?

from pelee.

cathy-kim commented on July 24, 2024

@xonobo I uploaded my TensorRT code for Pelee here: https://github.com/ginn24/Pelee-TensorRT

from pelee.

fps of Pelee on TX1 about pelee HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs