GithubHelp home page GithubHelp logo

fps of Pelee on TX1 about pelee HOT 10 OPEN

robert-junwang avatar robert-junwang commented on July 24, 2024
fps of Pelee on TX1

from pelee.

Comments (10)

Robert-JunWang avatar Robert-JunWang commented on July 24, 2024 2

PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.

from pelee.

Robert-JunWang avatar Robert-JunWang commented on July 24, 2024 1

I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100 FPS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.

from pelee.

lucheng07082221 avatar lucheng07082221 commented on July 24, 2024

@xonobo How to tansfer pepee to tensorRT? Can you share your experience?

from pelee.

lqs19881030 avatar lqs19881030 commented on July 24, 2024

@Robert-JunWang can you tell me how about changes the architecture and it can run over 104fps。

from pelee.

Ghustwb avatar Ghustwb commented on July 24, 2024

PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.

PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.

It is so cool!!

from pelee.

Ghustwb avatar Ghustwb commented on July 24, 2024

@Robert-JunWang Hi
thanks for your work
I only got 48fps on TX2 + TensorRT3.0.4,it is slower than mobileNet-SSD(54fps ,group conv).
you can run 70+fps,can you share you experience?
And can you tell me how to changes the architecture to get over 104fps.
Thanks

from pelee.

Ghustwb avatar Ghustwb commented on July 24, 2024

I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100+ PFS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.

Grouped conv has been optimized in cudnn7,the inference time of group conv depends on cudnn libraries.I think ,in the same cudnn version,whether use tensorRT3 or tensorRT4,the time cost is the same.
Yes,you are right.MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.MobileNet with Fp32 runs 50fps,FP16 runs 54fps,in my TX2.
Thanks for your reply,I will retry it

from pelee.

Robert-JunWang avatar Robert-JunWang commented on July 24, 2024

I guess you did not use jetson_clocks.sh to maximum GPU and CPU clockspeeds. After setting, both Pelee and SSD+MobileNet run over 70 FPS in FP32 mode. Pelee runs slightly faster than SSD+MobileNet in FP32 mode and much faster in FP16 mode on my TX2 (TensorRT4).

from pelee.

xonobo avatar xonobo commented on July 24, 2024

May I ask a more generic question about the TX deployments. As long as I know tensorRT misses some layers used in SSD like reshape, priorbox, detectionoutput. In your TX timing experiments how did you overcome this issue? Did you implemented your own plugin tensorRT layers for the missings or used some available code around?

from pelee.

cathy-kim avatar cathy-kim commented on July 24, 2024

@xonobo I uploaded my TensorRT code for Pelee here: https://github.com/ginn24/Pelee-TensorRT

from pelee.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.