The kaggle-protect-the-great-barrier-reef from denniesbor

Team Reefsave Help Protect the Great Barrier Reef

This repo hosts notebooks, and helper shell and python files used in data preprocessing and training of RCNN, YOLOv5, and YOLOv4 object detection neural net.

Literature

The choice of a neural network is dependent on the available software and hardware resources, speed ,and the expected accuracy. Object detection networks are classified as multi-stage or single stage.

Examples of single staged neural nets are the SSD, YOLO, etc. The multi staged approaches uses the region proposal networks in their architectures to extract feature maps from the backbone. Examples of multi stage networks are the RCNN and RFCN.

Architecture of a neural network

Object detection nets consists of the input, backbone, neck and the head. The input takes in an image, and it outputs to a feature extractor consisting of dense convolution and max pooling layers. Residual Network(ResNet), ResNext,DenseNet, VGG16 etc. are the commonly used backbones. They are trained on standardized datasets such as COCO or ImageNet.
The role of the neck is to extract feature maps e.g the Feature Pyramid Network. The head of a single stage network is dense prediction layer and sparse prediction for a two stage detector(i.e RCNN & RFCN)

[Figure 1. Schematic representation of a single and multi stage neural network. Source: [arxiv]

The choice of a neural network.

Computational resources determine the amount of time spent on training and inference. GPU and TPU runtime accelerate the training as well as the inference time. The computational resource demand differ from one model to another.

Speed is key in a real-time object detection system or video search engines. A balance of speed and resource requirements is considered to achieve optimal performance.

The implementation of the minimum viable product for the module one was based on the performance of the Faster R-CNN ResNet Inception, Yolov4 and Yolov5 on pre-processed TensorFlow-Protect the great barrier datasets.

Yolo(Single stage)

Yolo is a single stage state of the art object detection algorithm. There are 4 documented versions of YOLO and the fifth version designed by Ultralytics team. YOLO is described as a YOLOv4 implementation in Pytorch. Compared with other algorithms, YOLO5 perfoms exceptionally well with a less GPU time.

According to Huang,et al YOLO v4 attains a mean average precision of 43.5 running on a Tesla V100 GPUs while training on Common Objects in Context datasets. The neck of YOLO4 uses SPP and PAN.

[Figure 2. Yolo4 AP vs FPS against other object detectors: arxiv ]

[Figure 3. Average precision vs GPU speed of YOLO5 weights against EfficientDet on . on COCO datasets. Source: Ultralytics ]

What are Bag of Freebies and Bag of Specials?

They define the inference - training trade-off of a model. The bag of freebies are the methods applied to the model and which does not interfere with inference. Some of these methods include the data augmentation, regularization techniques e.g., dropout, drop-connect and drop-block.

The bag of freebies are the methods which improve the accuracy of the model by at the expense of inference costs. These methods introduce attention mechanisms. SPP is an example of this feature and is applied in YOLOv4.

Faster R-CNN(Multi stage)

R-CNN models is a multi layered conv neural network and consists of the feature extractor, a region proposal algorithm to generate bounding boxes, a regression and classification layer. R-CNNs tradeoff their speed for accuracy.

In Faster R-CNN, Region Proposal Network generation is not CPU restricted compared to the previous flavours of region convolution neural network.

[Figure 4. Mean average precision against backbone accuracy of Faster R-CNN, R-FCN and SSD]

MVP - Performance comparison of YOLOv4, YOLOv5 and R-CNN

The TensorFlow- Save the Great Barrier mvp is implemented using Faster R-CNN, YOLO4 and YOLO5 default tuning parameters. Performance analysis of the three models is done using their mean average precision. Faster RCNN runs on Resnet Inception backbone, whereas YOLO4 is built on darknet.

Yolov5

[Figure 5. Yolov5 performance metrics on MVP]

Yolov4

[Figure 6. Yolov4 performance metrics on MVP]

The loss metrics drops rapidly after training

FRCNN

The Faster RCNN with ResNet Inception backbone is resource intensive and couldn't make any inference

Conclusion

The yolov5 and yolov4 inference time is lower compared to F-RCNN. Yolov5 has been shown to improve the speed and accuracy of detection, and therefore recommended for tackling The Help Protect Great Barrier Reef task.

Resources and References

Faster RCNN Resources

Colab Notebook colab
object_detector model zoo
model FRCNN

YOLOv4 Resources

Colab Notebook colab
model yolov4
object detector darknet
yolov4.conv.137 yolov4.conv.137
weights yolov4.weights

YOLOv5 Resources

Colab Notebook colab
YOLO5 Model yolov5

References

[1]J. Huang et al., "Speed/accuracy trade-offs for modern convolutional object detectors", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1611.10012. [Accessed: 09- Feb- 2022].
[2]A. Bochkovskiy, C. Wang and H. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/2004.10934. [Accessed: 09- Feb- 2022].
[3]S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1506.01497. [Accessed: 09- Feb- 2022].

denniesbor / kaggle-protect-the-great-barrier-reef Goto Github PK

kaggle-protect-the-great-barrier-reef's Introduction