GithubHelp home page GithubHelp logo

Comments (11)

jkjung-avt avatar jkjung-avt commented on July 25, 2024 43

I try to answer based on what I know.

  1. TF-TRT should have been co-developed by NVIDIA and Google TensorFlow team. Otherwise what you've stated looked right.

  2. Current releases of TensorRT support 3 kinds of "parsers": Caffe, UFF and ONNX. According to the "Deprecation of Caffe Parser and UFF Parser" paragraph in TensorRT 7.0.0 Release Notes, Caffe and UFF parsers are deprecated. ONNX parser is the preferred one going forward.

    In order to optimize a TensorFlow model, you have the options of converting pb to either UFF or ONNX, and then to a TensorRT engine. In case there are layers in your model which are not supported (check this table) by TensorRT directly, you could either: (1) replace those layers with plugins; (2) don't include those layers when building the TensorRT engine, instead take TensorRT engine output and do postprocessing to complete the functions of those layers.

  3. There are pros and cons with TF-TRT.

    Pros: API is easy to use; don't need to worry about plugins.

    Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load TensorFlow into memory at runtime; usually runs more slowly than pure TensorRT engine.

  4. For "ssd_mobilenet_v2_coco", the UFF TensorRT engine runs much faster then the TF-TRT optimized graph for many reasons combined, I think:

    • Optimization done on the graph as a whole, not just on individual nodes or parts of the graph.
    • FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 only on some optimized parts of the graph.
    • More efficient implementation of NMS, etc. (plugins implemented with CUDA kernels) instead of the original TensorFlow ops.
  5. I don't know the details about freezing a TensorFlow graph. But I think you are right. Nodes would not be deleted in the frozen graph. One other important aspect of freezing a TensorFlow graph is that "variables" (trainable weights) would be turned into "constants". So the frozen graph is no longer trainable.

from tensorrt_demos.

PythonImageDeveloper avatar PythonImageDeveloper commented on July 25, 2024

Thanks, very good explain.
I don't understand this part, So, in the TensorRT method, If I don't install tensorflow library, this method also will work at the runtime?
What's your mean that whole Tensorflow loaded into memory?

Cons: Need to store the whole TensorFlow library in HDD (a disadvantage for the deployment environment); need to load the whole TensorFlow into memory at runtime;

in the ONNX layers supported, The NMS unsupported, So I don't convert the ssd models to onnx?

from tensorrt_demos.

jkjung-avt avatar jkjung-avt commented on July 25, 2024

Yes. If you convert the whole model to TensorRT engine (binary) and plugins, you could load the engine with only TensorRT libraries ("libvninfer.so", etc.) on the deployment devices. In such a case, you don't need to install TensorFlow on the device at all. And the model would consume much less memory at runtime. This is a big advantage on embedded systems.

Theoretically, we could replace unsupported layers in ONNX with plugins as well. But NVIDIA doesn't seem to provide good examples about how to do that. I see a lot of people having problems in this regard on NVIDIA/TensorRT issues board.

from tensorrt_demos.

xmuszq avatar xmuszq commented on July 25, 2024

@jkjung-avt
In this repo i think we are using TF-TRT, right?
Do we have any tutorial/sample for detection using puring TensorRT?

from tensorrt_demos.

jkjung-avt avatar jkjung-avt commented on July 25, 2024
  1. No. This repo is using Caffe parser (demo #1 & #2), UFF parser (demo #3) and ONNX parser (demo #4) to build the network and doing full TensorRT optimization (using plugins when necessary) of the models.

  2. I have another repo, jkjung-avt/tf_trt_models, which demonstrates how to program with TF-TRT. In that case, we do not use any TensorRT plugin. Instead, we use tesorflow ops for nodes which cannot be optimized by TensorRT.

  3. If you are looking for example code demonstrating how to create network layers with TensorRT API, check out: https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleMNISTAPI

from tensorrt_demos.

yuqcraft avatar yuqcraft commented on July 25, 2024

@jkjung-avt

Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.

converter = trt.TrtGraphConverter(input_saved_model_dir=saved_model_dir,
                                      precision_mode="FP16")

according to linke: https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#usage-example

According to your statement "FP16 computation throughout the whole TensorRT engine, instead of (TF-TRT case) FP16 on only the optimized parts of the graph", I wonder if it's because tf-trt doesn't have much room to optimize the model? But I find out that there's a lot of folding and merge going on during the conversion.

from tensorrt_demos.

PythonImageDeveloper avatar PythonImageDeveloper commented on July 25, 2024

It's likely your target device doesn't support FP16, right?

from tensorrt_demos.

yuqcraft avatar yuqcraft commented on July 25, 2024

@PythonImageDeveloper Thanks a lot for the reply.

I'm currently using x86 Ubuntu 18.04 with 2080ti. I think it does support FP16. So I suppose the precision_mode would output a fp16 graph (with DT_HALF on the saved_model.pb.txt) but turns out not.

Did you guys tried to use TrtGraphConverter with precision_mode=fp16? TrtGraphConverter is supported after 1.14.

Really appreciate any helps!

from tensorrt_demos.

jkjung-avt avatar jkjung-avt commented on July 25, 2024

Quick question: I find out that using tf-trt and convert the model to fp16 would result to a model still be fp32 instead of fp16. I checked the saved_model.pb.txt, the weights are still DT_FLOAT instead of DT_HALF.

This (saved weights are still fp32) does not prove the TF-TRT nodes/ops are not running in FP16 mode underneath. Take a look at this picture in NVIDIA's TensorRT Integration Speeds Up TensorFlow Inference blog post.

optimization_result_fig2

After the TF-TRT optimization process is done, some parts of the original TensorFlow graph get turned into single nodes. Those are TF-TRT nodes running some CUDA kernels. And it's hard to tell whether those kernels are doing FP16 or FP32 computations...

from tensorrt_demos.

yuqcraft avatar yuqcraft commented on July 25, 2024

@jkjung-avt

Thanks JK! I'm still reading the material. One interesting thing though:

https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#mem-manage
vs.
https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/#disqus_thread

It's quite different how they talk about per_process_gpu_memory_fraction. One says 0.67 means 0.67 for tf and remaining for tensorrt. The other one says 0.3 is set for both tf/tf-trt/tensorrt.

from tensorrt_demos.

YouSenRong avatar YouSenRong commented on July 25, 2024

@yuqcraft Hello, have you achieved the conversion from the FP32 model to FP16 counterpart with TF-TRT? Or do you have any other suggestion to convert a FP32 model to FP16 one? Thanks.

from tensorrt_demos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.