Object Detection in Urban Environment

Object Detection in Urban Environment

Introduction

Utilizing transfer learning using TensorFlow object detection API and AWS Sagemaker to train models to detect and classify objects using data from the Waymo Open Dataset.

Dataset

Front Camera Images from Waymo Open Dataset. Data are in TFRecord Format, the TFRecord format is a simple format for storing a sequence of binary records, which helps in data reading and processing efficiency.

Methodology

Training & Deployment Process with AWS

AWS Sagemaker for running Jupyter notebooks, training and deploying the model, and inference.
AWS Elastic Container Registry (ECR) to build the Docker image and create the container required for running this project.
AWS Simple Storage Service (S3) to save logs for creating visualizations. Also, the data for this project was stored in a public S3 bucket.

Model Selection

For this project, I tested several object detection models using the Tensorflow Object Detection API. The models tested were:

EfficientNet D1

SSD MobileNet V2 FPNLite

SSD ResNet50 V1 FPN

These pre-trained models are available in the TensorFlow 2 Object Detection Model Zoo, and they were trained on the COCO 2017 dataset. So, their pipeline.config files need to be adjusted so TensorFlow 2 can find the TFRecord and label_map.pbtxt files when they are loaded inside the container from Amazon S3.

Since the Waymo dataset has only 3 classes, Cars, Pedestrians, and Cyclists, the pipeline.config was adjusted to the project specifications instead of the 90 classes that were there for the COCO dataset.

For the 3 Models I used a fixed number of training steps which is 2000, this was due to limited AWS budget. I used Momentum Optimizer with the same batch size of 8 in the 3 experiments, for the same reason.

Results

Each model was evaluated using the mAP metric, which measures the accuracy of the model in detecting objects. The mAP is calculated based on the precision and recall of the model at different IoU (Intersection over Union) thresholds.

Tensorboard was used to visualize the training loss and validation mAP for each model. From the Tensorboard graphs, we observed that the models showed similar patterns in terms of training loss, but differed in their ability to generalize to the test data.

	EfficientNet D1	SSD MobileNet V2 FPNLite	SSD ResNet50 V1 FPN
mAP@ (0.5:0.95) IOU	0.0938	0.09543	0.05755
[email protected]	0.2253	0.2234	0.1248
[email protected]	0.0668	0.071	0.04505
mAP (small objects)	0.01484	0.0392	0.02317
mAP (medium objects)	0.364	0.3383	0.2107
mAP (large objects)	0.839	0.4531	0.1917
Predicted Vs Ground Truth Sample
Video

Based on the results of the three models evaluated for object detection in an urban environment, the SSD MobileNet V2 FPNLite model performed the best with an mAP@(0.5:0.05:0.95) IOU of 0.09543, outperforming both the EfficientNet D1 and SSD ResNet50 V1 FPN models.

In terms of detecting small objects like cyclists and pedestrians, the SSD MobileNet V2 FPNLite also had the highest mAP, indicating its ability to detect smaller objects better than the other models. However, the EfficientNet D1 had the highest mAP for large objects, suggesting that it may perform better in detecting larger objects like e.g nearby cars.

The three models had poor performance in detecting cyclists. This may be a result due to the skewness of the dataset, where cars are the dominant class in the dataset, and the cyclists class is the least abundant.

Overall, the model selection process showed that different models have different strengths and weaknesses in object detection, and choosing the right model for a specific application requires careful consideration of the type and size of the objects to be detected. Additionally, the results suggest that the ResNet50 model may not be the best choice for object detection in an urban environment, at least not without further optimization and tuning.

Here are the training losses of the 3 experiments:

The plots show that the 3 models could achieve better loss if we increased the n. of training steps because there is room for convergence.

Future Work & Possible Improvement

Identified several potential avenues for improving performance, but they would require additional resources and a higher computing budget. These include:

We can increase the training steps: Each model was trained for only 2000 steps, which is relatively low for such kinds of data and complex architectures. So, increasing the number of training steps till the loss reaches the plateau can further improve our performance.

We can apply data augmentation techniques such as flipping, scaling, and random cropping. Additionally, we can explore more advanced techniques such as color jittering, rotation, and translation, which can further enhance the accuracy of our model.

We should consider hyperparameter tuning. Fine-tuning the hyperparameters can potentially improve our model's performance by finding optimal values for parameters like learning rate, batch size, and regularization.

Another area to focus on is handling occlusion and partial object detection. In this project, our focus was on detecting complete objects. However, in an urban environment, objects are often partially occluded or obstructed. Developing techniques to handle partial object detection can be crucial in improving the overall performance of our model.

By addressing these areas, we can potentially achieve significant performance improvements, although it's important to note that they require additional resources and a higher computing budget.

vvlladd / objectdetectioninurbanenv Goto Github PK

objectdetectioninurbanenv's Introduction

Object Detection in Urban Environment

Table Of Contents

Introduction

Dataset

Methodology

Training & Deployment Process with AWS

Model Selection

Results

Future Work & Possible Improvement

objectdetectioninurbanenv's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs