GithubHelp home page GithubHelp logo

artyze / yolov3_lite Goto Github PK

View Code? Open in Web Editor NEW
36.0 4.0 9.0 47.57 MB

yolov3 model compress and acceleration (quantization, sparse), c++ version

Python 4.45% C 10.38% Shell 0.06% Cuda 2.43% C++ 82.58% Makefile 0.10%
mask acceleration bn-scales compress accelerate yolov3

yolov3_lite's Introduction

yolov3_lite

As my repo must run in industry embedded devices which has poor computer sources, so I have to compress and accelerate them step by step untill the inference time fit our boss's command :(

Backbone net of my project is yolov3-lite and optimise version.

In the process of creating my project, I have referenced some git projects and papers in cvpr, thanks to these guys.

I will continue to update afterwards, please stay tuned.

All accelerate switches can be found in MakeFile

[What tricks I used]

Multiple Threads

Set OPENMP := 1 in Makefile

If you know multiple threads run in arm of X86 chips, you must know Openmp.

Next picture is how Openmp runs. It has many tricks to ensure work well between threads.

The result of use openmp in project is:

Image text

Kernel Mask (net sparsity)

Set MASK := 1 in Makefile

It a regular method to decrease the computation of conv layers. But the key point is how to set which kernel is important and which kernel need to delete.

In this project, I referenced the paper of
Accelerating Convolutional Networks via Global & Dynamic Filter Pruning product of Tencent lab

The accelerating result of use kernel mask in project is:

Weights Prune

Set PRUNE := 1 in Makefile

Because this method is very simple, you just need to set weights < threshold to 0, so I don't need to introduce it anymore.

The accelerating result of use < kernel mask & weights prune > in project is:

L1 Regularization

L1 Regularization can be regard to another way to decrease kernels, the principle is like kernel decrease with BN parameters in other papers.

Yolo use L2 regularization as default, so you need to change it to L1 in code. This method has a disadvantage, you need to change cfg files after every epoch end (after one epoch train you know how many kernels to leave in every conv layer) k If you want to know more about L2 and L1 regularization in yolo, you can go to my blog

The accelerating result of use L1 Regulatization in project is:

Quantization

In the domain of network acceleration, Quantization is always the most important trick. I have realized two quantization type, which can be switched in Makefile.

Set QUANTIZATION := 1 in Makefile

This module were imported from AlexeyAB's github repo

As he introduced, this quantization method is referenced nvidia's TensorRT theory.

But when I test this module, it works not good, recently I added google's quantization method code to it.

Set QUANTIZATION_GOOGLE := 1 in Makefile

Paper: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference [C]// CVPR, 2018

The most novelty idea is plug in Fake Quantization in train process. And you can get the input quantization scale directly after model training instead of run calibrate process in calibration dataset.

And for the purpose of implemente the project to embedded devices, I added gemm_lowp of google to darknet.

Depthwise Conv

The key point of Mobilenet, it has been merged in yolov3 by the author, I optimized the code so that l.groups can be used in every module.

[How to train the repo]

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`

  1. start train
  
    ./darknet detector train [data_file path] cfg/yolov3.cfg [pretrain weights file] 
     
   4. start test
   set 'GPU=0'
   
   ./darknet detector test [data_file path] cfg/yolov3.cfg [weights file] [image file to detect]

[How to test the repo]

I have pretrained a model in backup, you can have a try :)

  1. analysis your original net, decide which module you need to use  
	  
  2. change makefile and open modules, for example, if you want to use image mask, you just need to set 
  `MASK=1`
   
  3. normal test
    ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg 
  4. test with nvidia quantization
     1). set QUANTIZATION := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg -quantized
  5. test with google quantization
     1). set QUANTIZATION_GOOGLE := 1
     2). ./darknet detector test [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup 000023.jpg

[Something more]

1. I added F1 score test code, the command is :

./darknet detector f1 [data_file path] cfg/yolov3-tiny-mask.cfg backup/yolov3-tiny-mask.backup

1. I also have some other modules such as `Hash Compress` `Huffman Compress`, but I can't give all of them to you with other 
reasons.

1. When I test all the method in tiny net(not in VGG), it can decrease inference time by 30%~50% with very little f1 decrease,
and if you want faster, use quantization, it will surprise you!!!!!

If you want to use my code, please let me know!!!!

yolov3_lite's People

Contributors

artyze avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

yolov3_lite's Issues

L1 scales regularization

你好,感谢您的分享。
請問可以說明一下關於L1 scales regularization 方法具體需要修改源碼那些地方嗎?

AVX Problem

When I make didn't have the AVX will have error
error

how to get output_scale and output_zero_point in the cfg file.

author ,
thanks for your providision, i want to run google-quant of this repo.
but idont konw how to get output_scale and output_zero_point in the cfg file.
output_scale = 0, 0, 0.0875, 0, 0.075846, 0, 0.069873, 0, 0.050603, 0, 0.031124, 0, 0.023692, 0.021359, 0.041686
output_zero_point = 0, 0, 94, 0, 101, 0, 131, 0, 129, 0, 110, 0, 122, 131, 99
and your example has no 2007_train.txt and data,can you provide it ?
can you anwser me ? thank you very much

prune weights for support yolo-v3

do you have the code for compress the yolo-v3 model, as yolo-v3 model with 416&416 need GPU memory 2~3GB, so do you have any suggest for reduce the memory?

Usage of this repo

@ArtyZe Thanks for referning this repo , i have few queries

  1. how different is ur repo with respect to any darknet
  2. what was the weight file size before pruning and after pruning
  3. Can i plot the weights values before pruning and after pruning to find out the difference

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.