Hi, I found that we can optimize the Tensorflow model in several ways. If I am mis

I try to answer based on what I know. TF-TRT sh

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No. This repo is using Caffe parser (demo <a class="issue-link j

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

TF-TRT vs UFF-TensorRT about tensorrt_demos HOT 11 CLOSED

jkjung-avt commented on July 25, 2024 2

TF-TRT vs UFF-TensorRT

from tensorrt_demos.

Comments (11)

jkjung-avt commented on July 25, 2024 43

I try to answer based on what I know.

TF-TRT should have been co-developed by NVIDIA and Google TensorFlow team. Otherwise what you've stated looked right.
Current releases of TensorRT support 3 kinds of "parsers": Caffe, UFF and ONNX. According to the "Deprecation of Caffe Parser and UFF Parser" paragraph in TensorRT 7.0.0 Release Notes, Caffe and UFF parsers are deprecated. ONNX parser is the preferred one going forward.

In order to optimize a TensorFlow model, you have the options of converting pb to either UFF or ONNX, and then to a TensorRT engine. In case there are layers in your model which are not supported (check this table) by TensorRT directly, you could either: (1) replace those layers with plugins; (2) don't include those layers when building the TensorRT engine, instead take TensorRT engine output and do postprocessing to complete the functions of those layers.
There are pros and cons with TF-TRT.

Pros: API is easy to use; don't need to worry about plugins.

Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load TensorFlow into memory at runtime; usually runs more slowly than pure TensorRT engine.
For "ssd_mobilenet_v2_coco", the UFF TensorRT engine runs much faster then the TF-TRT optimized graph for many reasons combined, I think:
- Optimization done on the graph as a whole, not just on individual nodes or parts of the graph.
- FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 only on some optimized parts of the graph.
- More efficient implementation of NMS, etc. (plugins implemented with CUDA kernels) instead of the original TensorFlow ops.
I don't know the details about freezing a TensorFlow graph. But I think you are right. Nodes would not be deleted in the frozen graph. One other important aspect of freezing a TensorFlow graph is that "variables" (trainable weights) would be turned into "constants". So the frozen graph is no longer trainable.

from tensorrt_demos.

PythonImageDeveloper commented on July 25, 2024

Thanks, very good explain.
I don't understand this part, So, in the TensorRT method, If I don't install tensorflow library, this method also will work at the runtime?
What's your mean that whole Tensorflow loaded into memory?

Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load the whole TensorFlow into memory at runtime;

in the ONNX layers supported, The NMS unsupported, So I don't convert the ssd models to onnx?

from tensorrt_demos.

jkjung-avt commented on July 25, 2024

Yes. If you convert the whole model to TensorRT engine (binary) and plugins, you could load the engine with only TensorRT libraries ("libvninfer.so", etc.) on the deployment devices. In such a case, you don't need to install TensorFlow on the device at all. And the model would consume much less memory at runtime. This is a big advantage on embedded systems.

Theoretically, we could replace unsupported layers in ONNX with plugins as well. But NVIDIA doesn't seem to provide good examples about how to do that. I see a lot of people having problems in this regard on NVIDIA/TensorRT issues board.

from tensorrt_demos.

xmuszq commented on July 25, 2024

@jkjung-avt
In this repo i think we are using TF-TRT, right?
Do we have any tutorial/sample for detection using puring TensorRT?

from tensorrt_demos.

jkjung-avt commented on July 25, 2024

No. This repo is using Caffe parser (demo #1 & #2), UFF parser (demo #3) and ONNX parser (demo #4) to build the network and doing full TensorRT optimization (using plugins when necessary) of the models.
I have another repo, jkjung-avt/tf_trt_models, which demonstrates how to program with TF-TRT. In that case, we do not use any TensorRT plugin. Instead, we use tesorflow ops for nodes which cannot be optimized by TensorRT.
If you are looking for example code demonstrating how to create network layers with TensorRT API, check out: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleMNISTAPI

from tensorrt_demos.

yuqcraft commented on July 25, 2024

@jkjung-avt

Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.

converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
                                      precision_mode="FP16")

according to linke: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example

According to your statement "FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 on only the optimized parts of the graph", I wonder if it's because tf-trt doesn't have much room to optimize the model? But I find out that there's a lot of folding and merge going on during the conversion.

from tensorrt_demos.

PythonImageDeveloper commented on July 25, 2024

It's likely your target device doesn't support FP16, right?

from tensorrt_demos.

yuqcraft commented on July 25, 2024

@PythonImageDeveloper Thanks a lot for the reply.

I'm currently using x86 Ubuntu 18.04 with 2080ti. I think it does support FP16. So I suppose the precision_mode would output a fp16 graph (with DT_HALF on the saved_model.pb.txt) but turns out not.

Did you guys tried to use TrtGraphConverter with precision_mode=fp16? TrtGraphConverter is supported after 1.14.

Really appreciate any helps!

from tensorrt_demos.

jkjung-avt commented on July 25, 2024

Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.

This (saved weights are still fp32) does not prove the TF-TRT nodes/ops are not running in FP16 mode underneath. Take a look at this picture in NVIDIA's TensorRT Integration Speeds Up TensorFlow Inference blog post.

After the TF-TRT optimization process is done, some parts of the original TensorFlow graph get turned into single nodes. Those are TF-TRT nodes running some CUDA kernels. And it's hard to tell whether those kernels are doing FP16 or FP32 computations...

from tensorrt_demos.

yuqcraft commented on July 25, 2024

@jkjung-avt

Thanks JK! I'm still reading the material. One interesting thing though:

https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#mem-manage
vs.
https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/#disqus_thread

It's quite different how they talk about per_process_gpu_memory_fraction. One says 0.67 means 0.67 for tf and remaining for tensorrt. The other one says 0.3 is set for both tf/tf-trt/tensorrt.

from tensorrt_demos.

YouSenRong commented on July 25, 2024

@yuqcraft Hello, have you achieved the conversion from the FP32 model to FP16 counterpart with TF-TRT? Or do you have any other suggestion to convert a FP32 model to FP16 one? Thanks.

from tensorrt_demos.

TF-TRT vs UFF-TensorRT about tensorrt_demos HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs