GithubHelp home page GithubHelp logo

objectdetectioninurbanenv's Introduction

Object Detection in Urban Environment

EfficientNet

Table Of Contents

Introduction

Utilizing transfer learning using TensorFlow object detection API and AWS Sagemaker to train models to detect and classify objects using data from the Waymo Open Dataset.

Dataset

Front Camera Images from Waymo Open Dataset. Data are in TFRecord Format, the TFRecord format is a simple format for storing a sequence of binary records, which helps in data reading and processing efficiency.

Methodology

Training & Deployment Process with AWS

  • AWS Sagemaker for running Jupyter notebooks, training and deploying the model, and inference.
  • AWS Elastic Container Registry (ECR) to build the Docker image and create the container required for running this project.
  • AWS Simple Storage Service (S3) to save logs for creating visualizations. Also, the data for this project was stored in a public S3 bucket.

Model Selection

For this project, I tested several object detection models using the Tensorflow Object Detection API. The models tested were:

EfficientNet D1

SSD MobileNet V2 FPNLite

SSD ResNet50 V1 FPN

These pre-trained models are available in the TensorFlow 2 Object Detection Model Zoo, and they were trained on the COCO 2017 dataset. So, their pipeline.config files need to be adjusted so TensorFlow 2 can find the TFRecord and label_map.pbtxt files when they are loaded inside the container from Amazon S3.

Since the Waymo dataset has only 3 classes, Cars, Pedestrians, and Cyclists, the pipeline.config was adjusted to the project specifications instead of the 90 classes that were there for the COCO dataset.

For the 3 Models I used a fixed number of training steps which is 2000, this was due to limited AWS budget. I used Momentum Optimizer with the same batch size of 8 in the 3 experiments, for the same reason.

Results

Each model was evaluated using the mAP metric, which measures the accuracy of the model in detecting objects. The mAP is calculated based on the precision and recall of the model at different IoU (Intersection over Union) thresholds.

Tensorboard was used to visualize the training loss and validation mAP for each model. From the Tensorboard graphs, we observed that the models showed similar patterns in terms of training loss, but differed in their ability to generalize to the test data.

EfficientNet D1 SSD MobileNet V2 FPNLite SSD ResNet50 V1 FPN
mAP@ (0.5:0.95) IOU 0.0938 0.09543 0.05755
[email protected] 0.2253 0.2234 0.1248
[email protected] 0.0668 0.071 0.04505
mAP (small objects) 0.01484 0.0392 0.02317
mAP (medium objects) 0.364 0.3383 0.2107
mAP (large objects) 0.839 0.4531 0.1917
Predicted Vs Ground Truth Sample EfficientNet MobileNet ResNet50
Video EfficientNet MobileNet ResNet50

Based on the results of the three models evaluated for object detection in an urban environment, the SSD MobileNet V2 FPNLite model performed the best with an mAP@(0.5:0.05:0.95) IOU of 0.09543, outperforming both the EfficientNet D1 and SSD ResNet50 V1 FPN models.

In terms of detecting small objects like cyclists and pedestrians, the SSD MobileNet V2 FPNLite also had the highest mAP, indicating its ability to detect smaller objects better than the other models. However, the EfficientNet D1 had the highest mAP for large objects, suggesting that it may perform better in detecting larger objects like e.g nearby cars.

The three models had poor performance in detecting cyclists. This may be a result due to the skewness of the dataset, where cars are the dominant class in the dataset, and the cyclists class is the least abundant.

Overall, the model selection process showed that different models have different strengths and weaknesses in object detection, and choosing the right model for a specific application requires careful consideration of the type and size of the objects to be detected. Additionally, the results suggest that the ResNet50 model may not be the best choice for object detection in an urban environment, at least not without further optimization and tuning.

Here are the training losses of the 3 experiments:

The plots show that the 3 models could achieve better loss if we increased the n. of training steps because there is room for convergence.

Future Work & Possible Improvement

Identified several potential avenues for improving performance, but they would require additional resources and a higher computing budget. These include:

We can increase the training steps: Each model was trained for only 2000 steps, which is relatively low for such kinds of data and complex architectures. So, increasing the number of training steps till the loss reaches the plateau can further improve our performance.

We can apply data augmentation techniques such as flipping, scaling, and random cropping. Additionally, we can explore more advanced techniques such as color jittering, rotation, and translation, which can further enhance the accuracy of our model.

We should consider hyperparameter tuning. Fine-tuning the hyperparameters can potentially improve our model's performance by finding optimal values for parameters like learning rate, batch size, and regularization.

Another area to focus on is handling occlusion and partial object detection. In this project, our focus was on detecting complete objects. However, in an urban environment, objects are often partially occluded or obstructed. Developing techniques to handle partial object detection can be crucial in improving the overall performance of our model.

By addressing these areas, we can potentially achieve significant performance improvements, although it's important to note that they require additional resources and a higher computing budget.

objectdetectioninurbanenv's People

Contributors

vvlladd avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.