GithubHelp home page GithubHelp logo

ardianumam / tensorflow-tensorrt Goto Github PK

View Code? Open in Web Editor NEW
304.0 304.0 109.0 3.94 MB

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.

Jupyter Notebook 81.84% Python 18.16%

tensorflow-tensorrt's People

Contributors

ardianumam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorflow-tensorrt's Issues

Problem defining output tensor

I've converted my darknet using: https://github.com/jinyu121/DW2TF. Which gives me the following files:

  • yolov3-customv1.ckpt.data-00000-of-00001
  • yolov3-customv1.ckpt.index
  • yolov3-customv1.ckpt.meta
  • yolov3-customv1.pb
    I then call your script with:
    with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess: saver = tf.train.import_meta_graph("/home/nvidia/Downloads/DW2TF/data/yolov3-customv1.ckpt.meta") saver.restore(sess, "yolov3-customv1.ckpt") your_outputs = ["output_tensor/random"] your_outputs = ["output_tensor/Softmax"] frozen_graph = tf.graph_util.convert_variables_to_constants( sess, # session tf.get_default_graph().as_graph_def(),# graph+weight from the session output_node_names=your_outputs) with gfile.FastGFile("./model/frozen_model.pb", 'wb') as f: f.write(frozen_graph.SerializeToString()) print("Frozen model is successfully stored!")

Then, I receive the following error:
AssertionError: output_tensor/Softmax is not in graph

So, I'm not sure where it needs to change.
Could anyone help me, please?

My Yolov3.cfg can be found at: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

Thanks.

read_pb is slow

Dear author,

It's a great project, and the result is good!

but when I ran yolov3 with TensorRT on TX2, it took a long time (about 10~20 mins) to run read_pb_return_tensors().
Is this right? I'm wondering whether I did something wrong ...

Thanks

Performance about TensorRT

I tried to accelerate my TensorFlow code by using TensorRT, but it didn't get any improvement?
in frozen_model, num of all_nodes = 893 in TensorRT_model, num of trt_engine_nodes = 0 in TensorRT_model, num of all_nodes = 831
Is there anything wrong? Thanks!

keras convert to tensorflow

thanks for you work very much,I have a problem,keras convert to tensorflow the resule should have two file but I get only one is 'checkpoint',why??

Huge Memory Consumption

I was wondering if you created a swapfile on your TX2. And I am trying to run YOLOv3 optimization on Jetson Nano and it consumes 3.4G of built in memory and 6.7G swap while max workspace is set to 256mb, is this normal?

Where do i change the anchors ? and tips for improving fps on certain models?

Hi,

I have a darknet weight for three classes trained with recalculated anchors.
I followed instructions from https://github.com/AlexeyAB/darknet
When i try this weights on your TensorRT implementation, the bounding boxes are off and are large in size.
Where do i update with my new anchor values? i searched through utils.py and i can only find anchor masks.

Regarding performance........

On pretrained yolo v3, it gives 30 fps
On two classes yolo v3 model, it gives 20 fps
On three classes yolo v3 model ( the one with anchors problem), it gives 35fps !!

I am trying to deploy the two classes yolo model which is fast enough to serve on real life applications. i'd like to have 25+ fps
When i add a third class into the mix, it gave me 35 + fps !
It really surprised me that merely adding a class almost doubles the performance.
If i want to maximize the performance, i just add a redundant class and train it along with the two classes (which is what i really need) ?

lastly, Will i be able to achieve 25+ fps on a Nvidia AGX Xavier ? i have never tried any of nvidia's single board solutions before.

Thanks for your great work on this repo.

Segmentation Fault while creating TRT_GRAPH

When I run ipynb 7, and specifically

your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"]
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=your_outputs,
max_batch_size=1,# specify your max batch size
max_workspace_size_bytes=2*(10**9),# specify the max workspace
precision_mode="FP16") # precision, can be "FP32" (32 floating point precision) or "FP16"

and I get a segmentation fault. -> [1] 20875 segmentation fault (core dumped) sudo python3 yoloconvert.py

Anyone else run into this problem?

Running against tensorRT version 0.0.0

Dear Ardian,

Thanks for your work. However I met some issues when try to run your project.

When I ran the optimizing of YOLOv3, there was not much improvement. The tensorRT model is not right. There are some info as following:

INFO: tensorflow: Running against TensorRT version 0.0.0

it seems there is no TRTEngineOp in the model:

numb. of trt_engine_node in TensorRT graph: 0

Here is my environment : Ubuntu 16.04, tensorflow-gpu 13.1, cuda 10.1, cudnn 7.5 and tensorRT 5.1.5.0.

All the imported package is right, such as:
import tensorrt
import tensorflow.contrib.tensorrt as trt

Could you help me with this issue? Is my environment wrong?

Much thanks,

Bigbai

trt_engine_nodes in TensorRT graph: 0

TensorRT model is successfully stored!
numb. of all_nodes in frozen graph: 8016
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 882

I don’t know why the model have no TensorRT graph and the trt model FPS is the same as original‘s.
at UBUNTU1804 1080TI tensorRT=5.1.5.0 tensorflow==1.14.0

Converted YOLOv3 model has the same size as before

Hi,
First of all, thanks for your tutorial. It helps me a lot.
I am confused that is it normal that converted yoloRT.pd model still has the same size as before?
In my case, I see the nodes(graphs) goes down from 1800 to 1400, but model size has no changed.
And envetually, the speed of model didn't improve too much.

Look forward your response :)

Tensorflow to Frozen model error

Hello, i am using Linux based Jetson nano and when trying to convert my tensorflow model to Frozen model i am getting this error:

assert d in name_to_node, "%s is not in graph" % d
AssertionError: output_tensor/softmax is not in graph

INT8 support

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment:
Ubuntu 18.10
Python 3.7.1
CUDA 10.0
cudNN 7.5.0
Tensorflow-gpu 1.13.1
TensorRT 5.0.2.6
GTX 1070

Run error

Hello,
i am getting this error of **

killed**

** when i run the first step of the converter

difficulty generating correct frozen .pb

I understand that you generated the provided yolov3_gpu_nms.pb using the pretrained coco weights from https://pjreddie.com/media/files/yolov3.weights placed in the checkpoint dir in this repo https://github.com/ardianumam/tensorflow-yolov3

My question is, what exactly did you run in order to generate the .pb? I generated my .pb with python convert_weight.py --convert --freeze but it doesn't give the correct output tensors as illustrated below.

  input_tensor, output_tensors = \
  utils.read_pb_return_tensors(tf.get_default_graph(),
                               GIVEN_ORIGINAL_YOLOv3_MODEL,
                               ["Placeholder:0", "concat_9:0", "mul_9:0"])
  print("\n\ninput_tensor\n", input_tensor)
  print("\n\noutput_tensors\n", output_tensors)

** Output for provided .pb (this works) **
input_tensor
 Tensor("import/Placeholder:0", shape=(1, 416, 416, 3), dtype=float32)

output_tensors
 [<tf.Tensor 'import/concat_9:0' shape=(1, 10647, 4) dtype=float32>, <tf.Tensor 'import/mul_9:0' shape=(1, 10647, 80) dtype=float32>]

** Output for my .pb (this doesn't work - see the output tensors shape)**
input_tensor
 Tensor("import/Placeholder:0", shape=(1, 416, 416, 3), dtype=float32)

output_tensors
 [<tf.Tensor 'import/concat_9:0' shape=(1, 10647, 4) dtype=float32>, <tf.Tensor 'import/mul_9:0' shape=(?,) dtype=int32>]

[ I've tried this conversion using TF 1.11, 1.12, 1.13rc2 - all give the same results]

Changing number of classes of YOLOv3

Hi,
I wanna know how would the line your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"] if instead of having 10 classes as output we only have 4 classes
An error is appearing when I convert my graph to the trt graph

Cannot assign a device for operation 'unstack_9': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
device='CPU'; T in [DT_VARIANT]
device='CPU'; T in [DT_RESOURCE]
device='CPU'; T in [DT_STRING]
device='CPU'; T in [DT_BOOL]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_INT64]
device='GPU'; T in [DT_INT64]
device='GPU'; T in [DT_INT32]
device='GPU'; T in [DT_BFLOAT16]
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]

[[{{node unstack_9}} = UnpackT=DT_STRING, axis=0, num=24, _device="/device:GPU:0"]]

Thank you

Yolov3 model operation did not accelerate significantly

Hello, thank you very much for sharing. It's a great project.
After I use tensor-rt to convert yolov3 model according to your steps, I find that the calculation improvement is not obvious.
The FPS of the original model and tensorrt model are about 30.
The platform I use is Google colab. It's strange that on a CPU only platform, acceleration is obvious, while GPU platform acceleration is not.
I'm very confused about this. What's wrong?
(Has Google colab optimized the model calculation?)

Baseline configuration

Hi,

Please update the README.md with the description of your environment. I know from your last video that you're running probably on desktop with GTX 1060, but which versions of NVIDIA drivers, CUDA , cuDNN and TensorRT?
Similarly, provide you setup for Jetson, please.

No Speedup Observed

Hi, thank you for this excellent tutorial guide! The problem I encountered was that I was unable to observe any speedup when I ran the code "7_optimizing_YOLOv3_using_TensorRT). In fact, for both with and without TensorRT I observed a slow fps of only 1.5fps.

There were 2 warnings in the console:

  1. I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
  2. I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Any advice is appreciated!

Requirements file please

Dear our famous and glorious author,

I am glad there is someone using TensorRT. But could you please add requirements.txt, so that I would not be facing error like "no module named " while I was running the notebooks

Thanks

How to implement TensorRT on custom number of classes

Hi,
I have a weights file trained on darknet for two classes.
I used your repo: https://github.com/ardianumam/tensorflow-yolov3 to convert my weights file to frozen graph.
Since num_classes is read from coco.names, i changed the coco.names to the classes i have.
I successfully generated frozen model and copied it over to your TensorRT repo to generate tensorrt model.
It created the TensorRT model successfully.
When i run the inference block of the code, it returns the following error:

2019-07-15 13:15:25.604401: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-15 13:15:25.719409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-15 13:15:25.719738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:02:00.0
totalMemory: 3.94GiB freeMemory: 3.70GiB
2019-07-15 13:15:25.719751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-07-15 13:15:26.679452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-15 13:15:26.679482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-07-15 13:15:26.679488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-07-15 13:15:26.679607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2017 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:02:00.0, compute capability: 5.2)
Segmentation fault (core dumped)

Since my number of classes are different from coco's 80, should i be following any additional steps ?
Please excuse my limited knowledge on tensorflow as i am not from computer science background.

Thanks.

typo in path for demo video

In the Yolo .ipynb, the demo video path id given as video_path = "./dataset/demo_video/road.mp4" # path for video input but in the repo it is 'road2.mp4'

how to get tf format for custom yolo weights

Thanks a lot for this highly educational repo!

I would like to train my own Yolov3 model and try implementing as a TRT engine in the way that you have. How would you recommend doing so? Did you use a TF implementation of Yolov3 to generate your weights or did you convert from the darknet .weights / .cfg ? Whichever approach, I'd be interested to know which specific repos / tools you used to do it?

Custom TensorFlow-Yolov3 to TF-TRT

Hi,
I am using the TensorFlow version of yolov3, it's not the same as the darknet, it used two yolov3 for feature extraction from visual and infrared images and then perform feature fusion and finally object detection.

My project can run in GTX 1080 Ti at about 40 FPS , but in Xavier NX the speed is 2 FPS.
Now my goal is to convert this TensorFlow model to onnx and trt engine to speed up in Xavier NX.
I have weights in both .ckpt and .pb format.
1-What steps I should follow? I am really confused, there are too many confusing articles about TF-TRT and TensorRT but no clear guidelines.
2-Do I need to use TF-TRT or TensorRT?
3-Can you give me a road map for this task? Is it helpful to speed up the detection in the Xavier NX?

I have spent about 15 days for trying by myself but failed, so finally I decided to post my question here. I hope you will guide me in this regard.

Thanks.

don't improve the performance of models on GTX 1080ti

Hi,
I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why?
I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.