GithubHelp home page GithubHelp logo

sjhpark / autonomous-vehicle-perception-image-classification Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 29.68 MB

Autonomous Vehicle Perception Image Classification

Python 0.25% Jupyter Notebook 99.75%

autonomous-vehicle-perception-image-classification's Introduction

Self-Driving Cars Percpetion - Image Classification

16-664 Self-Driving Cars: Perception & Control

Carnegie Mellon University

In-class Kaggle Challenge


Comparing Multiple Techniques for Vehicle Classification

driving

Abstract

Vehicle classification is a critical task in machine learning and computer vision, with many real-world applications. In this project, we explore multiple vehicle classification techniques using pre-processing and deep learning models on driving scenes. We compare the performance of different approaches and demonstrate their potential for real-time vehicle classification. We hope our results contribute to advancing research in machine learning and computer vision and offer a starting point for improving accuracy in vehicle classification.

Introduction

Vehicle classification has been an active area of research in the fields of machine learning and computer vision. We explored four diffnt approaches for vehicle classification using 10,204 snapshots of scenes via various pre-processing methods for training data (RGB images of game scenes) and convolutional neural network architectures. The dataset included 7,573 training images and 2,631 test images, each containing an RGB image, a 3D bounding box, a camera matrix, and a label. The dataset had three unique labels corresponding to sets of similar vehicles. (e.g. sedans and SUVs are classified as the label 1).

Bounding Boxes

Rotation vectors, centroids, sizes, and camera matrices of 3D bounding boxes are given for each of the vehicles in the training scene image.

  • 3D bounding box

image image image

  • Cropping a vehicle from a snpashot using 2D bounding box (computed from max & min vertices of 3D bounding box)

image image image image

Experiments

The first method involved fine-tuning a pre-trained ResNet18 model by passing training images through its 17 convolutional layers, concatenating bounding box coordinates to feed the result into a fully connected layer.

The second method transferred a pre-trained ResNet18 and replaced its last layer with a dense layer having 3 output neurons. The model was fine-tuned with training images.

In the third method, we combined a simple neural network, a feature extractor, and a pre-trained ResNet18. Unlike the first method, we converted the 3D bounding box coordinates for each image to 2D coordinates and used them to crop vehicles from the training images. The feature extractor then used a pre-trained ResNet18 to extract features from the cropped images. These features were then fed into the simple neural network, consisting of 2 dense layers, to capture additional features. We replaced the last layer of another pre-trained ResNet18 with this simple neural network and fine-tuned it using the full-scale training images.

In the fourth method, we modified a pre-trained ResNet18 by replacing its ReLU activation function with a leaky ReLU with negative slope of 0.01, and its last fully connected layer with three dense layers. We added these extra layers to capture more complex features. We chose leaky ReLU based on empirical results reported in \cite{2015activation}, which showed that it can lower test loss for convolutional neural networks such as CIFAR-10 and CIFAR-100 showing that it can lower test loss for convolutional neural networks using CIFAR benchmark. We set the negative slope as 0.001 to experiment.

$$ y_i = \begin{cases} x_i & \text{if } x_i \geq 0 \\ \frac{x_i}{a_i} & \text{if } x_i < 0 \end{cases} $$

Results

Our four experimental approaches were evaluated using 2,631 test images extracted from the game's 3D universe. Method 1 achieved 50.5% accuracy, while Method 2, Method 3, and Method 4 achieved 69.1%, 59.1%, and 60.8% accuracy, respectively. However, these results are dependent on the choice of hyperparameters and may vary accordingly.

Conclusion

Our second method achieved the highest test accuracy among all experimental methods suggesting that adding extra layers, activation functions, or data augmentation techniques does not always improve image classification performance. Experimenting with different combinations of these techniques may be necessary, and our findings provide additional insights for the field of autonomous vehicle perception.

References

[1] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. pages 770โ€“778, 2016.

[2] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolution network. 2015.

Appendix

Method Description
Method 1
  • Model - ResNet18 (17 convolutional layers & 1 dense layer; ReLU activation)
  • Data - Full-scale scene images
  • Additional Features - Coordinates of 3D bounding box around each vehicle
  • Preprocessing - Resizing, Normalization, Horizontal-Flipping
  • Stochastic Gradient Descent, Cross Entropy Loss
Method 2
  • Model - ResNet18 (17 convolutional layers & 1 dense layer; ReLU activation); Last layer modified to accept 3 classes
  • Data - Full-scale scene images
  • Preprocessing - Resizing, Normalization
  • Adam Optimizer, Cross Entropy Loss
Method 3
  • Model - ResNet18 (17 convolutional layers & 1 dense layer; ReLU activation); Feature Extractor (another ResNet18); Simple Neural Networks (2 dense layers)
  • Data - Full-scale scene images
  • Additional Features - Coordinates of 2D bounding box around each vehicle
  • Preprocessing - Resizing, Normalization, Horizontal Flipping, Rotation
  • Adam Optimizer, Cross Entropy Loss
Method 4
  • Model - Modified ResNet18 (17 convolutional layers & 3 dense layers; Leaky ReLU activation)
  • Data - Full-scale scene images
  • Preprocessing - Resizing, Normalization
  • Adam Optimizer, Cross Entropy Loss

autonomous-vehicle-perception-image-classification's People

Contributors

sjhpark avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

hridayeshvjoshi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.