ardianumam / tensorflow-tensorrt Goto Github PK

This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.

Jupyter Notebook 81.84% Python 18.16%

tensorflow-tensorrt's People

Contributors

Stargazers

Watchers

Forkers

acanak optimal16 ailibrary marouangit pked01 anujonthemove upendra2017 templeblock edenshin supermanhuyu cameron2018 arifsohaib shenmayufei hsulin0806 shashanksbb dahburj regeee aminehy amirunpri2018 olivetom ieee820 biswajitghosh98 jps892 float123 pratyushlohumi26 mennok masterskepticista mengjintao shengode503 fengsiyu sonnnguyen thinkall zhouyonglong yangshiyu89 ccsone hajungong007 yingmuying zijingmao lijiunderstand carbonrobotics iamsunguangzhi taomiao smartcai baidu88vip jiapei100 baifanysu cloudchenl kulshah leo-xukang kmarconi dhinkris rahairi cscn89 dreamerdoremi vineethm1627 wwb00l 71oliver pzhao16me thebajajra hbhbts dataxujing shuo-huai saddambinsyed niexiaokun arqam-ai rafa-cxg jxhekang dapenggg brahimharounhassan matio igi123 worldhellow cookcv rahulsud48 phamthaihoangtung propertyabc ankuraxz enriquesolarte hassanaliasghar kimi-zy ansonyanxin disorn-inc namanagarwall bflfanbo geen-c sumanth13131 khle08 fangliang425 snehashis1997 xrosliang huoyo uptodiff smile230619 suhendaragung20 gangshao jiankangren code-zyj hierarchyjk evawyf taner45

tensorflow-tensorrt's Issues

'TensorRT_model.pb' model's size is twice as large as the original pb model

Hi,
when i use tensorrt to generate trt model, i found the FP16 or INT8 trt model's size is twice as large as the original tensorflow pb model, i think it's wired, and do you know why?
my running environment is:
GPU 1080Ti, CUDA10.0, cudann7.5, Tensorflow1.13, TensorRT5.0
thanks

Problem defining output tensor

I've converted my darknet using: https://github.com/jinyu121/DW2TF. Which gives me the following files:

yolov3-customv1.ckpt.data-00000-of-00001
yolov3-customv1.ckpt.index
yolov3-customv1.ckpt.meta
yolov3-customv1.pb
I then call your script with:
with tf.Session(config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.50))) as sess: saver = tf.train.import_meta_graph("/home/nvidia/Downloads/DW2TF/data/yolov3-customv1.ckpt.meta") saver.restore(sess, "yolov3-customv1.ckpt") your_outputs = ["output_tensor/random"] your_outputs = ["output_tensor/Softmax"] frozen_graph = tf.graph_util.convert_variables_to_constants( sess, # session tf.get_default_graph().as_graph_def(),# graph+weight from the session output_node_names=your_outputs) with gfile.FastGFile("./model/frozen_model.pb", 'wb') as f: f.write(frozen_graph.SerializeToString()) print("Frozen model is successfully stored!")

Then, I receive the following error:
AssertionError: output_tensor/Softmax is not in graph

So, I'm not sure where it needs to change.
Could anyone help me, please?

My Yolov3.cfg can be found at: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

Thanks.

Can you share the github link of Yolov3 which you have used.

read_pb is slow

Dear author,

It's a great project, and the result is good!

but when I ran yolov3 with TensorRT on TX2, it took a long time (about 10~20 mins) to run read_pb_return_tensors().
Is this right? I'm wondering whether I did something wrong ...

Thanks

Performance about TensorRT

I tried to accelerate my TensorFlow code by using TensorRT, but it didn't get any improvement?
in frozen_model, num of all_nodes = 893 in TensorRT_model, num of trt_engine_nodes = 0 in TensorRT_model, num of all_nodes = 831
Is there anything wrong? Thanks!

keras convert to tensorflow

thanks for you work very much,I have a problem,keras convert to tensorflow the resule should have two file but I get only one is 'checkpoint',why??

How do you get the YOLOv3 Frozen Model used in this project

hi，this is great!
i'm just wondering how do you train the YOLOv3 Frozen Model used in this project?
thank U

Huge Memory Consumption

I was wondering if you created a swapfile on your TX2. And I am trying to run YOLOv3 optimization on Jetson Nano and it consumes 3.4G of built in memory and 6.7G swap while max workspace is set to 256mb, is this normal?

Where do i change the anchors ? and tips for improving fps on certain models?

Hi,

I have a darknet weight for three classes trained with recalculated anchors.
I followed instructions from https://github.com/AlexeyAB/darknet
When i try this weights on your TensorRT implementation, the bounding boxes are off and are large in size.
Where do i update with my new anchor values? i searched through utils.py and i can only find anchor masks.

Regarding performance........

On pretrained yolo v3, it gives 30 fps
On two classes yolo v3 model, it gives 20 fps
On three classes yolo v3 model ( the one with anchors problem), it gives 35fps !!

I am trying to deploy the two classes yolo model which is fast enough to serve on real life applications. i'd like to have 25+ fps
When i add a third class into the mix, it gave me 35 + fps !
It really surprised me that merely adding a class almost doubles the performance.
If i want to maximize the performance, i just add a redundant class and train it along with the two classes (which is what i really need) ?

lastly, Will i be able to achieve 25+ fps on a Nvidia AGX Xavier ? i have never tried any of nvidia's single board solutions before.

Thanks for your great work on this repo.

Segmentation Fault while creating TRT_GRAPH

When I run ipynb 7, and specifically

your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"]
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=your_outputs,
max_batch_size=1,# specify your max batch size
max_workspace_size_bytes=2*(10**9),# specify the max workspace
precision_mode="FP16") # precision, can be "FP32" (32 floating point precision) or "FP16"

and I get a segmentation fault. -> [1] 20875 segmentation fault (core dumped) sudo python3 yoloconvert.py

Anyone else run into this problem?

Running against tensorRT version 0.0.0

Dear Ardian,

Thanks for your work. However I met some issues when try to run your project.

When I ran the optimizing of YOLOv3, there was not much improvement. The tensorRT model is not right. There are some info as following:

INFO: tensorflow: Running against TensorRT version 0.0.0

it seems there is no TRTEngineOp in the model:

numb. of trt_engine_node in TensorRT graph: 0

Here is my environment : Ubuntu 16.04, tensorflow-gpu 13.1, cuda 10.1, cudnn 7.5 and tensorRT 5.1.5.0.

All the imported package is right, such as:
import tensorrt
import tensorflow.contrib.tensorrt as trt

Could you help me with this issue? Is my environment wrong?

Much thanks,

Bigbai

trt_engine_nodes in TensorRT graph: 0

TensorRT model is successfully stored!
numb. of all_nodes in frozen graph: 8016
numb. of trt_engine_nodes in TensorRT graph: 0
numb. of all_nodes in TensorRT graph: 882

I don’t know why the model have no TensorRT graph and the trt model FPS is the same as original‘s.
at UBUNTU1804 1080TI tensorRT=5.1.5.0 tensorflow==1.14.0

Why output_tensors=["Placeholder:0", "concat_9:0", "mul_9:0"]

I wan to know why output_tensors=["Placeholder:0", "concat_9:0", "mul_9:0"], these three nodes are not at the end of the model. And what does ":0" mean? Please advise，thank you.

Converted YOLOv3 model has the same size as before

Hi,
First of all, thanks for your tutorial. It helps me a lot.
I am confused that is it normal that converted yoloRT.pd model still has the same size as before?
In my case, I see the nodes(graphs) goes down from 1800 to 1400, but model size has no changed.
And envetually, the speed of model didn't improve too much.

Look forward your response :)

Tensorflow to Frozen model error

Hello, i am using Linux based Jetson nano and when trying to convert my tensorflow model to Frozen model i am getting this error:

assert d in name_to_node, "%s is not in graph" % d
AssertionError: output_tensor/softmax is not in graph

INT8 support

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment:
Ubuntu 18.10
Python 3.7.1
CUDA 10.0
cudNN 7.5.0
Tensorflow-gpu 1.13.1
TensorRT 5.0.2.6
GTX 1070

Run error

Hello,
i am getting this error of **

killed**

** when i run the first step of the converter

using other models from TensorFlow object detection zoo.

Hi I'm trying to use other models from

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

But I don't know if the jupyter code is compatible with this?? For example with ssd_mobilenet 2.
The tar.gz that you download has other files apart from interference_graph.pb like meta files and checkpoint.

difficulty generating correct frozen .pb

I understand that you generated the provided yolov3_gpu_nms.pb using the pretrained coco weights from https://pjreddie.com/media/files/yolov3.weights placed in the checkpoint dir in this repo https://github.com/ardianumam/tensorflow-yolov3

My question is, what exactly did you run in order to generate the .pb? I generated my .pb with python convert_weight.py --convert --freeze but it doesn't give the correct output tensors as illustrated below.

  input_tensor, output_tensors = \
  utils.read_pb_return_tensors(tf.get_default_graph(),
                               GIVEN_ORIGINAL_YOLOv3_MODEL,
                               ["Placeholder:0", "concat_9:0", "mul_9:0"])
  print("\n\ninput_tensor\n", input_tensor)
  print("\n\noutput_tensors\n", output_tensors)

** Output for provided .pb (this works) **
input_tensor
 Tensor("import/Placeholder:0", shape=(1, 416, 416, 3), dtype=float32)

output_tensors
 [<tf.Tensor 'import/concat_9:0' shape=(1, 10647, 4) dtype=float32>, <tf.Tensor 'import/mul_9:0' shape=(1, 10647, 80) dtype=float32>]

** Output for my .pb (this doesn't work - see the output tensors shape)**
input_tensor
 Tensor("import/Placeholder:0", shape=(1, 416, 416, 3), dtype=float32)

output_tensors
 [<tf.Tensor 'import/concat_9:0' shape=(1, 10647, 4) dtype=float32>, <tf.Tensor 'import/mul_9:0' shape=(?,) dtype=int32>]

[ I've tried this conversion using TF 1.11, 1.12, 1.13rc2 - all give the same results]

Changing number of classes of YOLOv3

Hi,
I wanna know how would the line your_outputs = ["Placeholder:0", "concat_9:0", "mul_9:0"] if instead of having 10 classes as output we only have 4 classes
An error is appearing when I convert my graph to the trt graph

Cannot assign a device for operation 'unstack_9': Could not satisfy explicit device specification '/device:GPU:0' because no supported kernel for GPU devices is available.
Registered kernels:
device='CPU'; T in [DT_VARIANT]
device='CPU'; T in [DT_RESOURCE]
device='CPU'; T in [DT_STRING]
device='CPU'; T in [DT_BOOL]
device='CPU'; T in [DT_COMPLEX128]
device='CPU'; T in [DT_COMPLEX64]
device='CPU'; T in [DT_DOUBLE]
device='CPU'; T in [DT_FLOAT]
device='CPU'; T in [DT_BFLOAT16]
device='CPU'; T in [DT_HALF]
device='CPU'; T in [DT_INT8]
device='CPU'; T in [DT_UINT8]
device='CPU'; T in [DT_INT16]
device='CPU'; T in [DT_UINT16]
device='CPU'; T in [DT_INT32]
device='CPU'; T in [DT_INT64]
device='GPU'; T in [DT_INT64]
device='GPU'; T in [DT_INT32]
device='GPU'; T in [DT_BFLOAT16]
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]

[[{{node unstack_9}} = UnpackT=DT_STRING, axis=0, num=24, _device="/device:GPU:0"]]

Thank you

Yolov3 model operation did not accelerate significantly

Hello, thank you very much for sharing. It's a great project.
After I use tensor-rt to convert yolov3 model according to your steps, I find that the calculation improvement is not obvious.
The FPS of the original model and tensorrt model are about 30.
The platform I use is Google colab. It's strange that on a CPU only platform, acceleration is obvious, while GPU platform acceleration is not.
I'm very confused about this. What's wrong?
（Has Google colab optimized the model calculation?）

Instaling Tensorflow 1.11 Jetson TX2

Hi I just wantted to ask how did you install the Tensorflow 1.11 in the Jetson TX2

Baseline configuration

Hi,

Please update the README.md with the description of your environment. I know from your last video that you're running probably on desktop with GTX 1060, but which versions of NVIDIA drivers, CUDA , cuDNN and TensorRT?
Similarly, provide you setup for Jetson, please.

How to run the saved tensorRT pb in C++?

Can the saved tensorRT pb model run in c++?

No Speedup Observed

Hi, thank you for this excellent tutorial guide! The problem I encountered was that I was unable to observe any speedup when I ran the code "7_optimizing_YOLOv3_using_TensorRT). In fact, for both with and without TensorRT I observed a slow fps of only 1.5fps.

There were 2 warnings in the console:

I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA

Any advice is appreciated!

Requirements file please

Dear our famous and glorious author,

I am glad there is someone using TensorRT. But could you please add requirements.txt, so that I would not be facing error like "no module named " while I was running the notebooks

Thanks

How to implement TensorRT on custom number of classes

Hi,
I have a weights file trained on darknet for two classes.
I used your repo: https://github.com/ardianumam/tensorflow-yolov3 to convert my weights file to frozen graph.
Since num_classes is read from coco.names, i changed the coco.names to the classes i have.
I successfully generated frozen model and copied it over to your TensorRT repo to generate tensorrt model.
It created the TensorRT model successfully.
When i run the inference block of the code, it returns the following error:

2019-07-15 13:15:25.604401: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-15 13:15:25.719409: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-15 13:15:25.719738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 970 major: 5 minor: 2 memoryClockRate(GHz): 1.2405
pciBusID: 0000:02:00.0
totalMemory: 3.94GiB freeMemory: 3.70GiB
2019-07-15 13:15:25.719751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-07-15 13:15:26.679452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-15 13:15:26.679482: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-07-15 13:15:26.679488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-07-15 13:15:26.679607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2017 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:02:00.0, compute capability: 5.2)
Segmentation fault (core dumped)

Since my number of classes are different from coco's 80, should i be following any additional steps ?
Please excuse my limited knowledge on tensorflow as i am not from computer science background.

Thanks.

which version of python are you using?

which version of python are you using? maybe this could be indicated in the Readme :)

No OpKernel was registered to support Op TRTEnginOp used by {{node_TRTEngineOp_4}}

Running

2_inference_using_TensorRT-model.ipynb

gives the below error, and I have:

The TensorFlow-GPU 1.15.0
The Nvidia RTX 2070

Any idea what could be the reason?!

typo in path for demo video

In the Yolo .ipynb, the demo video path id given as video_path = "./dataset/demo_video/road.mp4" # path for video input but in the repo it is 'road2.mp4'

how to get tf format for custom yolo weights

Thanks a lot for this highly educational repo!

I would like to train my own Yolov3 model and try implementing as a TRT engine in the way that you have. How would you recommend doing so? Did you use a TF implementation of Yolov3 to generate your weights or did you convert from the darknet .weights / .cfg ? Whichever approach, I'd be interested to know which specific repos / tools you used to do it?

Custom TensorFlow-Yolov3 to TF-TRT

Hi,
I am using the TensorFlow version of yolov3, it's not the same as the darknet, it used two yolov3 for feature extraction from visual and infrared images and then perform feature fusion and finally object detection.

My project can run in GTX 1080 Ti at about 40 FPS , but in Xavier NX the speed is 2 FPS.
Now my goal is to convert this TensorFlow model to onnx and trt engine to speed up in Xavier NX.
I have weights in both .ckpt and .pb format.
1-What steps I should follow? I am really confused, there are too many confusing articles about TF-TRT and TensorRT but no clear guidelines.
2-Do I need to use TF-TRT or TensorRT?
3-Can you give me a road map for this task? Is it helpful to speed up the detection in the Xavier NX?

I have spent about 15 days for trying by myself but failed, so finally I decided to post my question here. I hope you will guide me in this regard.

Thanks.

don't improve the performance of models on GTX 1080ti

Hi,
I optimized my trained model (1 class), the ssdlite_mobilenetv2 and ssd_resnet50, with TensorRT, but the performance did't improve significantly, I reach from 0.12 sec to 0.11 sec on GTX 1080 ti, why?
I installed Tensorflow 1.12.0 , cuda 9 , TensorRT 4.0.1.6 packages on Ubuntu 16.04.

ardianumam / tensorflow-tensorrt Goto Github PK

tensorflow-tensorrt's People

Contributors

Stargazers

Watchers

Forkers

tensorflow-tensorrt's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs