GithubHelp home page GithubHelp logo

kaggle-protect-the-great-barrier-reef's Introduction

Team Reefsave Help Protect the Great Barrier Reef

This repo hosts notebooks, and helper shell and python files used in data preprocessing and training of RCNN, YOLOv5, and YOLOv4 object detection neural net.


Literature

The choice of a neural network is dependent on the available software and hardware resources, speed ,and the expected accuracy. Object detection networks are classified as multi-stage or single stage.

Examples of single staged neural nets are the SSD, YOLO, etc. The multi staged approaches uses the region proposal networks in their architectures to extract feature maps from the backbone. Examples of multi stage networks are the RCNN and RFCN.

Architecture of a neural network

Object detection nets consists of the input, backbone, neck and the head. The input takes in an image, and it outputs to a feature extractor consisting of dense convolution and max pooling layers. Residual Network(ResNet), ResNext,DenseNet, VGG16 etc. are the commonly used backbones. They are trained on standardized datasets such as COCO or ImageNet.
The role of the neck is to extract feature maps e.g the Feature Pyramid Network. The head of a single stage network is dense prediction layer and sparse prediction for a two stage detector(i.e RCNN & RFCN)

[Figure 1. Schematic representation of a single and multi stage neural network. Source: [arxiv]

The choice of a neural network.

Computational resources determine the amount of time spent on training and inference. GPU and TPU runtime accelerate the training as well as the inference time. The computational resource demand differ from one model to another.

Speed is key in a real-time object detection system or video search engines. A balance of speed and resource requirements is considered to achieve optimal performance.

The implementation of the minimum viable product for the module one was based on the performance of the Faster R-CNN ResNet Inception, Yolov4 and Yolov5 on pre-processed TensorFlow-Protect the great barrier datasets.

Yolo(Single stage)

Yolo is a single stage state of the art object detection algorithm. There are 4 documented versions of YOLO and the fifth version designed by Ultralytics team. YOLO is described as a YOLOv4 implementation in Pytorch. Compared with other algorithms, YOLO5 perfoms exceptionally well with a less GPU time.

According to Huang,et al YOLO v4 attains a mean average precision of 43.5 running on a Tesla V100 GPUs while training on Common Objects in Context datasets. The neck of YOLO4 uses SPP and PAN.

[Figure 2. Yolo4 AP vs FPS against other object detectors: arxiv ]
yolo
[Figure 3. Average precision vs GPU speed of YOLO5 weights against EfficientDet on . on COCO datasets. Source: Ultralytics ]

What are Bag of Freebies and Bag of Specials?

They define the inference - training trade-off of a model. The bag of freebies are the methods applied to the model and which does not interfere with inference. Some of these methods include the data augmentation, regularization techniques e.g., dropout, drop-connect and drop-block.

The bag of freebies are the methods which improve the accuracy of the model by at the expense of inference costs. These methods introduce attention mechanisms. SPP is an example of this feature and is applied in YOLOv4.

Faster R-CNN(Multi stage)

R-CNN models is a multi layered conv neural network and consists of the feature extractor, a region proposal algorithm to generate bounding boxes, a regression and classification layer. R-CNNs tradeoff their speed for accuracy.

In Faster R-CNN, Region Proposal Network generation is not CPU restricted compared to the previous flavours of region convolution neural network.

[Figure 4. Mean average precision against backbone accuracy of Faster R-CNN, R-FCN and SSD]

MVP - Performance comparison of YOLOv4, YOLOv5 and R-CNN

The TensorFlow- Save the Great Barrier mvp is implemented using Faster R-CNN, YOLO4 and YOLO5 default tuning parameters. Performance analysis of the three models is done using their mean average precision. Faster RCNN runs on Resnet Inception backbone, whereas YOLO4 is built on darknet.

Yolov5



[Figure 5. Yolov5 performance metrics on MVP]

Yolov4



[Figure 6. Yolov4 performance metrics on MVP]

The loss metrics drops rapidly after training

FRCNN

The Faster RCNN with ResNet Inception backbone is resource intensive and couldn't make any inference

Conclusion

The yolov5 and yolov4 inference time is lower compared to F-RCNN. Yolov5 has been shown to improve the speed and accuracy of detection, and therefore recommended for tackling The Help Protect Great Barrier Reef task.

Resources and References

Faster RCNN Resources


YOLOv4 Resources


YOLOv5 Resources


References

[1]J. Huang et al., "Speed/accuracy trade-offs for modern convolutional object detectors", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1611.10012. [Accessed: 09- Feb- 2022].
[2]A. Bochkovskiy, C. Wang and H. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/2004.10934. [Accessed: 09- Feb- 2022].
[3]S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1506.01497. [Accessed: 09- Feb- 2022].

kaggle-protect-the-great-barrier-reef's People

Contributors

denniesbor avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.