Comments (11)
I try to answer based on what I know.
-
TF-TRT should have been co-developed by NVIDIA and Google TensorFlow team. Otherwise what you've stated looked right.
-
Current releases of TensorRT support 3 kinds of "parsers": Caffe, UFF and ONNX. According to the "Deprecation of Caffe Parser and UFF Parser" paragraph in TensorRT 7.0.0 Release Notes, Caffe and UFF parsers are deprecated. ONNX parser is the preferred one going forward.
In order to optimize a TensorFlow model, you have the options of converting pb to either UFF or ONNX, and then to a TensorRT engine. In case there are layers in your model which are not supported (check this table) by TensorRT directly, you could either: (1) replace those layers with plugins; (2) don't include those layers when building the TensorRT engine, instead take TensorRT engine output and do postprocessing to complete the functions of those layers.
-
There are pros and cons with TF-TRT.
Pros: API is easy to use; don't need to worry about plugins.
Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load TensorFlow into memory at runtime; usually runs more slowly than pure TensorRT engine.
-
For "ssd_mobilenet_v2_coco", the UFF TensorRT engine runs much faster then the TF-TRT optimized graph for many reasons combined, I think:
- Optimization done on the graph as a whole, not just on individual nodes or parts of the graph.
- FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 only on some optimized parts of the graph.
- More efficient implementation of NMS, etc. (plugins implemented with CUDA kernels) instead of the original TensorFlow ops.
-
I don't know the details about freezing a TensorFlow graph. But I think you are right. Nodes would not be deleted in the frozen graph. One other important aspect of freezing a TensorFlow graph is that "variables" (trainable weights) would be turned into "constants". So the frozen graph is no longer trainable.
from tensorrt_demos.
Thanks, very good explain.
I don't understand this part, So, in the TensorRT method, If I don't install tensorflow library, this method also will work at the runtime?
What's your mean that whole Tensorflow loaded into memory?
Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load the whole TensorFlow into memory at runtime;
in the ONNX layers supported, The NMS unsupported, So I don't convert the ssd models to onnx?
from tensorrt_demos.
Yes. If you convert the whole model to TensorRT engine (binary) and plugins, you could load the engine with only TensorRT libraries ("libvninfer.so", etc.) on the deployment devices. In such a case, you don't need to install TensorFlow on the device at all. And the model would consume much less memory at runtime. This is a big advantage on embedded systems.
Theoretically, we could replace unsupported layers in ONNX with plugins as well. But NVIDIA doesn't seem to provide good examples about how to do that. I see a lot of people having problems in this regard on NVIDIA/TensorRT issues board.
from tensorrt_demos.
@jkjung-avt
In this repo i think we are using TF-TRT, right?
Do we have any tutorial/sample for detection using puring TensorRT?
from tensorrt_demos.
-
No. This repo is using Caffe parser (demo #1 & #2), UFF parser (demo #3) and ONNX parser (demo #4) to build the network and doing full TensorRT optimization (using plugins when necessary) of the models.
-
I have another repo, jkjung-avt/tf_trt_models, which demonstrates how to program with TF-TRT. In that case, we do not use any TensorRT plugin. Instead, we use tesorflow ops for nodes which cannot be optimized by TensorRT.
-
If you are looking for example code demonstrating how to create network layers with TensorRT API, check out: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleMNISTAPI
from tensorrt_demos.
Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.
converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
precision_mode="FP16")
according to linke: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example
According to your statement "FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 on only the optimized parts of the graph", I wonder if it's because tf-trt doesn't have much room to optimize the model? But I find out that there's a lot of folding and merge going on during the conversion.
from tensorrt_demos.
It's likely your target device doesn't support FP16, right?
from tensorrt_demos.
@PythonImageDeveloper Thanks a lot for the reply.
I'm currently using x86 Ubuntu 18.04 with 2080ti. I think it does support FP16. So I suppose the precision_mode would output a fp16 graph (with DT_HALF on the saved_model.pb.txt) but turns out not.
Did you guys tried to use TrtGraphConverter with precision_mode=fp16? TrtGraphConverter is supported after 1.14.
Really appreciate any helps!
from tensorrt_demos.
Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.
This (saved weights are still fp32) does not prove the TF-TRT nodes/ops are not running in FP16 mode underneath. Take a look at this picture in NVIDIA's TensorRT Integration Speeds Up TensorFlow Inference blog post.
After the TF-TRT optimization process is done, some parts of the original TensorFlow graph get turned into single nodes. Those are TF-TRT nodes running some CUDA kernels. And it's hard to tell whether those kernels are doing FP16 or FP32 computations...
from tensorrt_demos.
Thanks JK! I'm still reading the material. One interesting thing though:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#mem-manage
vs.
https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/#disqus_thread
It's quite different how they talk about per_process_gpu_memory_fraction. One says 0.67 means 0.67 for tf and remaining for tensorrt. The other one says 0.3 is set for both tf/tf-trt/tensorrt.
from tensorrt_demos.
@yuqcraft Hello, have you achieved the conversion from the FP32 model to FP16 counterpart with TF-TRT? Or do you have any other suggestion to convert a FP32 model to FP16 one? Thanks.
from tensorrt_demos.
Related Issues (20)
- Calib.table not created Deepstream
- Error when install install_pycuda.sh file HOT 2
- Any plan about SwinIR? HOT 1
- Batch Processed Image Inference HOT 2
- Failure to Build ONNX from Custom Yolo HOT 2
- Error when running on separate thread HOT 2
- wrong many bounding box and wrong predict untrained class id HOT 2
- error happening when sh ./install_pycuda.sh HOT 3
- Integration With AGX Orin HOT 4
- yolov3 onnx format with low opset, suitable for tensorrt 6.5 HOT 1
- Trouble with linker on yolo plugin x86 HOT 2
- Xavier onnx to TensorRT error HOT 1
- How to prevent tensorrt from fusing the final layers when trying to convert tiny yolov3 to int8? HOT 2
- No output when running my own yolov4
- Failed to build the TensorRT engine --int8
- How do I free up memory when I want to switch models
- Cudnn initialization error: I already installed Cudnn but Why?
- Is there any way to avoid using the reshape operation after inference?
- DRIVE AGX Orin
- Facing Issue with yolo to onnx conversion with pre-trained yolov3-tiny model.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorrt_demos.