Vehicle Detection and Tracking

https://github.com/bayne/CarDetection

The goals / steps of this project are the following:

  • Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
  • Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
  • Note: for those first two steps don’t forget to normalize your features and randomize a selection for training and testing.
  • Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
  • Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
  • Estimate a bounding box for vehicles detected.

screen shot 2017-05-21 at 7 00 41 pm

Feature extraction

To detects cars in the image we need to be able to classify subsamples of the full frame into two categories: car and non-car. A classifier requires us to provide it features which it will use to determine if the image is a car or not. In this project I focused on using a particular type of feature called a Histogram of Oriented Gradients.

Histogram of Oriented Gradients (HOG)

The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. Wikipedia

I wrapped the call to the skimage.feature.hog() function in my own class called FeatureExtractor. I intended for this class to encapsulate the logic for extracting features from a given image. It was also useful for optimization in later parts of the pipeline.

Parameter selection

I choose the HOG parameters through trial and error and manually modifying them until I got better results. The quality of the results were evaluated based on timeliness and accuracy of the final output. The pixels_per_cell parameter however varied based on the sliding window scale so that the features from the video would match the feature size from the training data set.

Training the classifier

The classifier was trained using only HOG features from the saturation and value channels of the HSV image. I initially only used the saturation channel but later found that including the value channel improved the performance significantly. I used a support vector machine to power the classifier via sklearn.svm.LinearSVC().

I wrapped the classifier in my own class called CarClassifier. The purpose for this was to split out the training step out from the rest of the code and limit the pickling to only the CarClassifier.

Sub-sampling the full frame

Since the goal of the project is to find cars in a given frame, we need some type of mechanism to get subsamples of the full frame. The classifier is fed these subsamples and tells us which is a car and which is not a car. The approach I went with for generating these subsamples was a sliding window search.

Frame

test6

Car

601199

Not a car

7269436

Intermediate step (pipeline demonstration) image

The simplest approach would be just to slide a window across the entire image of varying sizes. Although this would result in the highest number of possible true positives, it would also be too computationally expensive. In this project one of the constraints is to reduce the amount of time required to process each frame.

My implementation of the sliding window search generates several windows with the given attributes:

I used a couple of tunable parameters to find the best distribution of windows:

region

A rectangle the defines where to generate the windows in, this is how I prevented the window search from searching in the sky.

window_y_step_ratio

Defined the amount of overlap each window had when it slid in the y direction

window_x_step_ratio

Defined the amount of overlap each window had when it slid in the x direction

scale_rate_y

The rate at which the windows grew as they approached the bottom of the frame

min_window_size

The smallest window size which is found nearest the top of the region

Optimization

The HOG feature extractor is an expensive operation so optimization efforts were focused on reducing the number of calls to that function. Initially I implemented the feature extractor to extract on demand when given a subsample. Using this method resulted in the same areas of the image having the operation performed on it more than once. I optimized this process by:

  1. Get every window size that will be used
  2. Create a FeatureExtractor for each given window size
  3. Extract the HOG features for each given window size in the region of interest
  4. Subsample the HOG features for any given window

Development optimization

Since my development cycle involved a significant amount of parameter tuning, it payed off to reduce the amount of time required to generate an output video. The generation of the HOG features was also saved to disk to be used for future iterations. This was implemented in a class I refer to as the FeatureExtractorCache

Processing the video

video

[video](https://drive.google.com/file/d/0B1CQ1n9EZIF6RjBOOFhYSW1DeDg/view?usp=sharing)

Up to this point the pipeline was focused on a single frame. When given a video, there is additional information that can be used to accurately find a car in the image. I utilized a heatmap that persisted from frame to frame and was the underlying structure that powered the eventual annotations (bounding boxes around the cars) on the frame.

Once again I turned the heatmap into a class which I called Heatmap. The heatmap collected the windows that were identified as cars by the classifier and attempted to filter out false positives.

image

The filtering of the false positives were controlled by the following tunable parameters:

warmup_rate

This is the value that gets added to the heatmap when a given pixel is found in a window that was positively identified as a car

cooldown_rate

The rate at which all pixels in the heatmap decrease. This removed the false positives from persisting in the heatmap

threshold

The minimum value of a pixel in the heatmap to be considered as a true positive. This is eventually used when generating the bounding box.

Bounding box

The bounding box was the final output and what was used to annotate the frame with a true positive of a car. This was relatively straightforward and merely used a contour finding function provided by OpenCV. The contour was generated around the thresholded heatmap values.

Possible problems/improvements