Comments (10)
PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.
from pelee.
I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100 FPS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.
from pelee.
@xonobo How to tansfer pepee to tensorRT? Can you share your experience?
from pelee.
@Robert-JunWang can you tell me how about changes the architecture and it can run over 104fps。
from pelee.
PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.
PeleeNet is built in a multi-branch and narrow channel style. TensorRT can combine many small branches of PeleeNet into a large layer and speed up the running time greatly. I do not have TX1 but the result on TX2 is not bad. This version of Pelee (304x304) can run over 71 FPS on TX2+TensorRT3.0. With some small changes to the architecture, it can run over 104 FPS.
It is so cool!!
from pelee.
@Robert-JunWang Hi
thanks for your work
I only got 48fps on TX2 + TensorRT3.0.4,it is slower than mobileNet-SSD(54fps ,group conv).
you can run 70+fps,can you share you experience?
And can you tell me how to changes the architecture to get over 104fps.
Thanks
from pelee.
I did not do any specific processing and just converted the merged Caffe model to TensorRT engine file. This speed is running FP32, The speed of the FP16 model is over 100+ PFS. I am surprised to know that you can run mobilenet+ssd over 54FPS on TensorRT3 with grouped conv. In my experiment, TensorRT3.0 has a very bad performance for grouped conv. It even much slower than NVCaffe running on CPU. The performance of grouped conv is improved greatly on TensorRT4. MobileNet+SSD runs at the similar speed to Pelee on FP32 mode on TensorRT4. However, MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.
Grouped conv has been optimized in cudnn7,the inference time of group conv depends on cudnn libraries.I think ,in the same cudnn version,whether use tensorRT3 or tensorRT4,the time cost is the same.
Yes,you are right.MobileNet cannot benefit from FP16 inference on TX2. The model runs on FP16 mode is almost the same as the one on FP32 mode.MobileNet with Fp32 runs 50fps,FP16 runs 54fps,in my TX2.
Thanks for your reply,I will retry it
from pelee.
I guess you did not use jetson_clocks.sh to maximum GPU and CPU clockspeeds. After setting, both Pelee and SSD+MobileNet run over 70 FPS in FP32 mode. Pelee runs slightly faster than SSD+MobileNet in FP32 mode and much faster in FP16 mode on my TX2 (TensorRT4).
from pelee.
May I ask a more generic question about the TX deployments. As long as I know tensorRT misses some layers used in SSD like reshape, priorbox, detectionoutput. In your TX timing experiments how did you overcome this issue? Did you implemented your own plugin tensorRT layers for the missings or used some available code around?
from pelee.
@xonobo I uploaded my TensorRT code for Pelee here: https://github.com/ginn24/Pelee-TensorRT
from pelee.
Related Issues (20)
- fine tuning with different number of classes HOT 1
- peele-SSD add_extra_layers_pelee
- About the stanford dog dataset. HOT 3
- 2-way dense layer in code and paper seems mismatch. HOT 2
- how can i get the fps=120 on nvidia tx2? please help me HOT 1
- Calculation of number of parameter, macc, and flops HOT 3
- pytorch pretained model
- max_iter
- Does it support 512 or bigger input size? HOT 1
- question for iteration HOT 1
- can not download the pretrained PeleeNet model
- one question
- Question for peleeNet structure
- How can I train my own model?
- train error
- peleeNet speed in GTX1080ti HOT 1
- Question about 1x1 convolutional kernels to reduce computational cost
- the paper was accepted two years ago???
- Pelee input resolution problem
- question about merge bn?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pelee.