16-664 Self-Driving Cars: Perception & Control
Carnegie Mellon University
Vehicle classification is a critical task in machine learning and computer vision, with many real-world applications. In this project, we explore multiple vehicle classification techniques using pre-processing and deep learning models on driving scenes. We compare the performance of different approaches and demonstrate their potential for real-time vehicle classification. We hope our results contribute to advancing research in machine learning and computer vision and offer a starting point for improving accuracy in vehicle classification.
Vehicle classification has been an active area of research in the fields of machine learning and computer vision. We explored four diffnt approaches for vehicle classification using 10,204 snapshots of scenes via various pre-processing methods for training data (RGB images of game scenes) and convolutional neural network architectures. The dataset included 7,573 training images and 2,631 test images, each containing an RGB image, a 3D bounding box, a camera matrix, and a label. The dataset had three unique labels corresponding to sets of similar vehicles. (e.g. sedans and SUVs are classified as the label 1).
Rotation vectors, centroids, sizes, and camera matrices of 3D bounding boxes are given for each of the vehicles in the training scene image.
- 3D bounding box
- Cropping a vehicle from a snpashot using 2D bounding box (computed from max & min vertices of 3D bounding box)
The first method involved fine-tuning a pre-trained ResNet18 model by passing training images through its 17 convolutional layers, concatenating bounding box coordinates to feed the result into a fully connected layer.
The second method transferred a pre-trained ResNet18 and replaced its last layer with a dense layer having 3 output neurons. The model was fine-tuned with training images.
In the third method, we combined a simple neural network, a feature extractor, and a pre-trained ResNet18. Unlike the first method, we converted the 3D bounding box coordinates for each image to 2D coordinates and used them to crop vehicles from the training images. The feature extractor then used a pre-trained ResNet18 to extract features from the cropped images. These features were then fed into the simple neural network, consisting of 2 dense layers, to capture additional features. We replaced the last layer of another pre-trained ResNet18 with this simple neural network and fine-tuned it using the full-scale training images.
In the fourth method, we modified a pre-trained ResNet18 by replacing its ReLU activation function with a leaky ReLU with negative slope of 0.01, and its last fully connected layer with three dense layers. We added these extra layers to capture more complex features. We chose leaky ReLU based on empirical results reported in \cite{2015activation}, which showed that it can lower test loss for convolutional neural networks such as CIFAR-10 and CIFAR-100 showing that it can lower test loss for convolutional neural networks using CIFAR benchmark. We set the negative slope as 0.001 to experiment.
Our four experimental approaches were evaluated using 2,631 test images extracted from the game's 3D universe. Method 1 achieved 50.5% accuracy, while Method 2, Method 3, and Method 4 achieved 69.1%, 59.1%, and 60.8% accuracy, respectively. However, these results are dependent on the choice of hyperparameters and may vary accordingly.
Our second method achieved the highest test accuracy among all experimental methods suggesting that adding extra layers, activation functions, or data augmentation techniques does not always improve image classification performance. Experimenting with different combinations of these techniques may be necessary, and our findings provide additional insights for the field of autonomous vehicle perception.
[1] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. pages 770โ778, 2016.
[2] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolution network. 2015.
Method | Description |
---|---|
Method 1 |
|
Method 2 |
|
Method 3 |
|
Method 4 |
|