GithubHelp home page GithubHelp logo

zw-shen / mutualguide Goto Github PK

View Code? Open in Web Editor NEW

This project forked from zhanghengdev/mutualguide

0.0 0.0 0.0 8.73 MB

Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection

Home Page: https://openaccess.thecvf.com/content/ACCV2020/html/Zhang_Localize_to_Classify_and_Classify_to_Localize_Mutual_Guidance_in_ACCV_2020_paper.html

License: MIT License

Shell 1.67% Python 98.33%

mutualguide's Introduction

Introduction

MutualGuide is a compact object detector specially designed for edge computing devices. Comparing to existing detectors, this repo contains two key features.

Firstly, the Mutual Guidance mecanism assigns labels to the classification task based on the prediction on the localization task, and vice versa, alleviating the misalignment problem between both tasks; Secondly, the teacher-student prediction disagreements guides the knowledge transfer in a feature-based detection distillation framework, thereby reducing the performance gap between both models.

For more details, please refer to our ACCV paper and BMVC paper.

Planning

  • Train medium and large models.
  • Add SIOU loss.
  • Add CspDarknet backbone.
  • Add RepVGG backbone.
  • Add ShuffleNetV2 backbone.
  • Add SwinTransformer backbone.
  • Add TensorRT transform code for inference acceleration.
  • Add vis function to plot detection results.
  • Add custom dataset training (annotations in XML format).

Benchmark

Backbone Size APval
0.5:0.95
APval
0.5
APval
0.75
APval
small
APval
medium
APval
large
Params
(M)
FLOPs
(G)
Speed
(ms)
cspdarknet-0.75 640x640 43.0 61.1 46.2 24.2 50.0 59.9 24.32 24.02 11.4(3060)
cspdarknet-0.5 640x640 40.4 58.4 43.3 21.0 46.4 58.0 17.40 12.67 6.5(3060)
shufflenet-1.5 640x640 35.7 53.9 37.9 16.5 41.3 53.5 2.55 2.65 5.6(3060)
shufflenet-1.0 640x640 31.8 49.0 33.1 13.6 35.8 48.4 1.50 1.47 5.4(3060)

Remarks:

  • The precision is measured on the COCO2017 Val dataset.
  • The inference runtime is measured by Pytorch framework (without TensorRT acceleration) on a GTX 3060 GPU, and the post-processing time (e.g., NMS) is not included (i.e., we measure the model inference time).
  • To dowload from Baidu cloud, go to this link (password: mugu).

Datasets

First download the COCO2017 dataset, you may find the sripts in data/scripts/ helpful. Then modify the parameter self.root in data/coco.py to the path of COCO dataset:

self.root = os.path.join("/home/heng/Documents/Datasets/", "COCO/")

Remarks:

  • For training on custom dataset, first modify the dataset path and categories XML_CLASSES in data/xml_dataset.py. Then apply --dataset XML.

Training

For training with Mutual Guide:

$ python3 train.py --neck ssd --backbone vgg16    --dataset COCO
                          fpn            resnet34           VOC
                          pafpn          repvgg-A2          XML
                                         cspdarknet-0.75
                                         shufflenet-1.0
                                         swin-T

For knowledge distillation using PDF-Distil:

$ python3 distil.py --neck ssd --backbone vgg11    --dataset COCO  --kd pdf
                           fpn            resnet18           VOC
                           pafpn          repvgg-A1          XML
                                          cspdarknet-0.5
                                          shufflenet-0.5

Remarks:

  • For training without MutualGuide, just use the --mutual_guide False;
  • For training on custom dataset, convert your annotations into XML format and use the parameter --dataset XML. An example is given in datasets/XML/;
  • For knowledge distillation with traditional MSE loss, just use parameter --kd mse;
  • The default folder to save trained model is weights/.

Evaluation

Every time you want to evaluate a trained network:

$ python3 test.py --neck ssd --backbone vgg11    --dataset COCO --trained_model path_to_saved_weights --vis
                         fpn            resnet18           VOC
                         pafpn          repvgg-A1          XML
                                        cspdarknet-0.5
                                        shufflenet-0.5

Remarks:

  • It will directly print the mAP, AP50 and AP50 results on COCO2017 Val;
  • Add parameter --vis to draw detection results. They will be saved in draw/VOC/ or draw/COCO/ or draw/XML/;

Citing us

Please cite our papers in your publications if they help your research:

@InProceedings{Zhang_2020_ACCV,
    author    = {Zhang, Heng and Fromont, Elisa and Lefevre, Sebastien and Avignon, Bruno},
    title     = {Localize to Classify and Classify to Localize: Mutual Guidance in Object Detection},
    booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)},
    month     = {November},
    year      = {2020}
}

@InProceedings{Zhang_2021_BMVC,
    author    = {Zhang, Heng and Fromont, Elisa and Lefevre, Sebastien and Avignon, Bruno},
    title     = {PDF-Distil: including Prediction Disagreements in Feature-based Distillation for object detection},
    booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
    month     = {November},
    year      = {2021}
}

Acknowledgement

This project contains pieces of code from the following projects: ssd.pytorch, rfbnet, mmdetection and yolox.

mutualguide's People

Contributors

zhanghengdev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.