Image classification takes a patch of an image as input ,and outputs what the image contains, we known as "Class label".
Object detection can help us find all instances of or objects in an image. The input is an image and the output is a rectangle corresponding to the location of the object we want to detect.
Feature Descriptor : Doing both image classification or object detection, we need an useful feature descriptor to help us extract the important informations in the image, and simplifies the image of size width x height x 3 (channels ) to a feature vector of length n.
What is an "useful" feature descriptor? In the feature space of the feature descriptor we are looking for two things:
1. The images of objects of the same class should be close together.
2. The images of objects of two different classes should be far apart.
In the article we use HOG feature descriptor(Histogram of Oriented Gradients) with SVM (Support Vector Machine ) to do image classification and object detection. In the HOG, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features, beacuse the intensity of gradients is large around the edges in the picture, and the edges can give us the imformation about object shap. Next section we will give a brief introduction of each step.
Step 1 : Preprocessing & Normalization
The HOG feature descriptor is calculated on a 64×128 patch of an image(pedestrian detection), but patches at different scales are analyzed at many image locations, so the constraint is that the patches should have a fixed aspect ratio. In our case, the patches need to have an aspect ratio of 1:2.
Step 2 : Calculate the Gradient Images
First, we need to calculate the horizontal and vertical gradients by filtering the image with the following kernels, [-1, 0, 1] and [-1, 0, 1]^T.(OpenCV/Sobel operator with kernal size 1.) Second, we need to find the magnitude and direction of gradient, we can use function cartToPolar in OpenCV to solve the problem.
Step 3 : Calculate Histogram of Gradients in 8×8 cells
In this step, the image is divided into 8×8 cells and calculates each cell for their histogram of gradients. The gradient of this patch contains 2 values (magnitude and direction) per pixel, which total up to 8 x 8 x 2 = 128 numbers. We store the result using a 9-bin histogram.
Step 4 : Normalization
Gradients of an image are sensitive to lighting, so we would like to "normalize" the histogram by doing L2 normalization of RGB color vector, in OpenCV we can use L2Hys to solve it.
After we get the HOG feature descriptor, then put into Support Vector Machine to train our classifier, we keep 20% of the data for test set.
Object detection is based on classifier, and need to show the locations of objects. The objects in the picture have multiple scales, so we create the image pyramid for objects to match the fixed size of detecion patch. SVM will classify the patch to find out if the object in the patch is what we want.