This repo hosts notebooks, and helper shell and python files used in data preprocessing and training of RCNN
, YOLOv5
, and YOLOv4
object detection neural net.
The choice of a neural network is dependent on the available software and hardware resources, speed ,and the expected accuracy. Object detection networks are classified as multi-stage or single stage.
Examples of single staged neural nets are the SSD, YOLO, etc. The multi staged approaches uses the region proposal networks in their architectures to extract feature maps from the backbone. Examples of multi stage networks are the RCNN and RFCN.
Object detection nets consists of the input, backbone, neck and the head. The input takes in an image, and it outputs to a feature extractor consisting of dense convolution and max pooling layers. Residual Network(ResNet), ResNext,DenseNet, VGG16 etc. are the commonly used backbones. They are trained on standardized datasets such as COCO or ImageNet.
The role of the neck is to extract feature maps e.g the Feature Pyramid Network. The head of a single stage network is dense prediction layer and sparse prediction for a two stage detector(i.e RCNN & RFCN)
[Figure 1. Schematic representation of a single and multi stage neural network. Source: [arxiv]
Computational resources determine the amount of time spent on training and inference. GPU and TPU runtime accelerate the training as well as the inference time. The computational resource demand differ from one model to another.
Speed is key in a real-time object detection system or video search engines. A balance of speed and resource requirements is considered to achieve optimal performance.
The implementation of the minimum viable product for the module one was based on the performance of the Faster R-CNN ResNet Inception, Yolov4 and Yolov5 on pre-processed TensorFlow-Protect the great barrier datasets.
Yolo is a single stage state of the art object detection algorithm. There are 4 documented versions of YOLO and the fifth version designed by Ultralytics team. YOLO is described as a YOLOv4 implementation in Pytorch. Compared with other algorithms, YOLO5 perfoms exceptionally well with a less GPU time.
According to Huang,et al YOLO v4 attains a mean average precision of 43.5 running on a Tesla V100 GPUs while training on Common Objects in Context datasets. The neck of YOLO4 uses SPP and PAN.
[Figure 2. Yolo4 AP vs FPS against other object detectors: arxiv ]
[Figure 3. Average precision vs GPU speed of YOLO5 weights against EfficientDet on . on COCO datasets. Source: Ultralytics ]
They define the inference - training trade-off of a model. The bag of freebies are the methods applied to the model and which does not interfere with inference. Some of these methods include the data augmentation, regularization techniques e.g., dropout, drop-connect and drop-block.
The bag of freebies are the methods which improve the accuracy of the model by at the expense of inference costs. These methods introduce attention mechanisms. SPP is an example of this feature and is applied in YOLOv4.
R-CNN models is a multi layered conv neural network and consists of the feature extractor, a region proposal algorithm to generate bounding boxes, a regression and classification layer. R-CNNs tradeoff their speed for accuracy.
In Faster R-CNN, Region Proposal Network generation is not CPU restricted compared to the previous flavours of region convolution neural network.
[Figure 4. Mean average precision against backbone accuracy of Faster R-CNN, R-FCN and SSD]
[Figure 5. Yolov5 performance metrics on MVP]
[Figure 6. Yolov4 performance metrics on MVP]
The loss metrics drops rapidly after training
The Faster RCNN with ResNet Inception backbone is resource intensive and couldn't make any inference
The yolov5 and yolov4 inference time is lower compared to F-RCNN. Yolov5 has been shown to improve the speed and accuracy of detection, and therefore recommended for tackling The Help Protect Great Barrier Reef task.Colab Notebook
colabmodel
yolov4object detector
darknetyolov4.conv.137
yolov4.conv.137weights
yolov4.weights
[1]J. Huang et al., "Speed/accuracy trade-offs for modern convolutional object detectors", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1611.10012. [Accessed: 09- Feb- 2022].
[2]A. Bochkovskiy, C. Wang and H. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/2004.10934. [Accessed: 09- Feb- 2022].
[3]S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", arXiv.org, 2022. [Online]. Available: https://arxiv.org/abs/1506.01497. [Accessed: 09- Feb- 2022].