GithubHelp home page GithubHelp logo

ericlavigne / lyft-perception-challenge Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 0.0 3.48 MB

Won 28th place in this competition to accurately detect cars and road in images from CARLA simulator at 10 frames per second.

Python 92.75% Shell 7.25%
udacity-self-driving-car lyft-perception-challenge convolutional-neural-networks segmentation

lyft-perception-challenge's Introduction

Lyft Perception Challenge

In May 2018, Udacity and Lyft organized a competition for accurate and fast classification of dashcam images as part of a self-driving car's perception process. This project identifies vehicles and drivable road to meet the requirements of that competition. All dashcam images were extracted from the CARLA simulator.

Dashcam Image Leaderboard
dashcam picture leaderboard

Highlights

  • Segmentation model combines ideas from SegNet and Inception
  • Inference server process avoids loading libraries and models during a realtime process
  • Multiprocessing better utilizes both CPU and GPU at the same time
  • Weighted mean-squared error loss function optimizes segmentation for rare class

Note: Find the latest version of this project on Github.


Contents


Project Components

Segmentation Model

The overall shape of the segmentation model is loosely based on SegNet. At a smaller scale, individual convolutions in SegNet are replaced by Inception modules scaled to similar output depth as the original SegNet convolutions. The Inception modules increase the learning potential of the network compared to the original SegNet while also allowing greater flexibility for increasing the layer depths without an explosion in the number of connections.

I created two separate segmentation models tailored to the needs of the two classes: vehicle and road. Cars were much more difficult to classify accurately, so the vehicle model has roughly four times the output depth per Inception module compared with the road model. Keeping the models separate makes it possible to directly allocate extra learning potential and inference time to the more difficult classification problem.

Inference Speed Optimization

Self-driving cars need to interpret images at 10 frames per second to ensure that they can react quickly to unexpected changes. This challenge allows lower inference speeds, but deducts a steep penalty of one point per FPS below 10. The need to keep inference fast limits the complexity of models and leads to much more difficulty in creating accurate segmentation models.

Rather than accepting a hard limit on the complexity of segmentation models, I chose to optimize the surrounding inference process, leaving as much room as possible for powerful segmentation models.

  1. Inference server process avoids loading libraries and models during a realtime process.

Loading libraries and models can both take substantial time when a Python program first starts up. I solved this problem by starting a separate server process that would load all relevant libraries, load models, and even warm up on a small practice problem. After the server is finished with the warm up process, a client script can go through the grading process and delegate work to the server process. The client and server communicate via ZeroMQ.

This approach is relevant for use on a real car because the FPS requirement is intended to measure latency in an ongoing realtime activity, for which load time is not relevant. If a perception module needed to provide fast inference immediately when the car is turned on, it could achieve this by by first loading a lower accuracy model that is sufficient for driveway use and have the stronger model loaded before the car moved into a more difficult situation.

  1. Multiprocessing better utilizes both CPU and GPU at the same time.

While inference is primarily performed by the GPU, several of the surrounding processes take substantial CPU time: loading frames from the input video, cropping and scaling to a smaller image size for inference, scaling back up after inference, and encoding the output in PNG format. If these processes are performed sequentially, then at any time only the GPU or one of the CPUs will be active. Instead, I created several parallel processes communicating via Python's multiprocessing Pipes. In a real car, this would mean that the perception pipeline could minimize latency by accepting a second image for pre-processing while previous images are still going through inference or post-processing.

Note: Python has weak support for multiprocessing. While this technique was effective, the same technique in C++ would likely provide a much better speedup.

  1. Batching multiple images better utilizes the GPU.

At first I avoided batching because it seemed unrealistic. If I wait for 10 images to arrive and perform inference all at once, this could meet the 10 FPS throughput requirement while providing a latency equivalent to only 1 FPS.

Surprisingly, it turns out that batching is relevant for a real self-driving car. Self-driving cars have multiple cameras whose output must be processed in parallel. Tesla, for example, uses 8 optical cameras on each car. Batch inference is an effective way to process the output from 8 cameras at once. Also, the hosted workspace used for this competition is about 5 times slower than my two-year old TitanX, so I suspect that the same code running on modern hardware could handle 10 FPS inference on all 8 cameras at once.

Usage

Installation

  1. Clone the repository
git clone https://github.com/ericlavigne/Lyft-Perception-Challenge
  1. Setup virtualenv.
cd Lyft-Perception-Challenge
virtualenv -p python3 env
source env/bin/activate
./setup.sh
pip install -r requirements.txt
deactivate

Running Inference for Official Grader

This project uses a client/server architecture for inference in order to avoid paying a startup cost (loading libraries and models) during a real-time process. The server and client should be started in separate consoles. The server should be allowed to complete the warmup process before running the client.

  1. Running the server
cd Lyft-Perception-Challenge
./setup.sh
python submit_server.py

Wait for the warmup process to complete. The server will report that warmup has completed and show speed statistics for a small video on which it performs inference during the warmup process.

  1. Running the client
cd Lyft-Perception-Challenge
grader 'python submit_client.py'
submit

Training

The neural network training process assumes that training data can be found in /tmp/Train as a result of running setup.sh during the installation process.

python train_car.py
python train_road.py

Testing

For manual unit testing, the test.py script creates visual examples of each step in the /tmp/output directory. The test.py script assumes that training data can be found in /tmp/Train as a result of running setup.sh during the installation process.

python test.py

Acknowledgements

Ong Chin-Kiat (chinkiat), Phu Nguyen (phmagic), and Mohamed Eltohamy all collected extra training data from CARLA and posted that data for other students to use. I have no measurement for how this affected my project, but suspect that it was very helpful for improving the accuracy.

Jay Wijaya (jaycode) shared the hint that OpenCV's VideoCapture was faster than the scikit-video operation that was used in Udacity's example script. This change improved my speed by 0.5 FPS.

Phu Nguyen (phmagic) shared the hint that OpenCV was faster than PIL for encoding to PNG format. This change cut the PNG encoding time in half but did not affect my overall speed because that part of the process was already moved into its own thread and not a bottleneck.

lyft-perception-challenge's People

Contributors

ericlavigne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.