Achour Idoughi - Project03
Achour Idoughi - Project03
Introduction
Object detection is a method for detecting the presence of an instance or a class of objects in a digital
image; face detection and person detection are an example. These methods have applications in multiple
fields, such as content image search or video surveillance.
Before extracting an object or classifying it, its shape and texture, which distinguish it from another
object, must be defined. These two properties can be found by combining feature elements in a given
image (points, lines, edges and corners).
Many algorithms were presented to extract features from binary images through literature. In this
project, the Histogram of Oriented Gradients (HOG) is going to be discussed and used to extract
features of an image dataset that includes both the training set and the testing set. The extracted
features are used to train and test a linear Support Vector Machine (SVM) classifier to
detect pedestrian.
• The Algorithm
Here are the steps used to extract image features using HOG:
• Step 1: Pretreatment
- A preprocessing step can be carried out before the calculation of the gradient. (Median
filter)
- Resize the images to 64*128; bring down the width to height ratio to 1:2.
• Step 2: Magnitude and orientation
- In the first step to calculate a HOG descriptor, we must first calculate the horizontal and
vertical gradients. This is achieved by filtering the image with kernels like Sobel and
Prewitt... etc. In this project the following operators are used:
- For each pixel, the magnitude (M) and the direction (orientation 𝜃) of the gradient as
follows:
𝑀 = √𝑑𝑥 2 + 𝑑𝑦 2
𝑑𝑦
𝜃 = tan−1( )
𝑑𝑥
𝑑𝑥, 𝑑𝑦 are the horizontal and the vertical gradients, respectively.
- The image is divided into 8*8 cells, and the histogram of oriented gradients is computed for
each cell. This is done to get features for the smaller patches which represent the whole image.
The size of the cell can be changed from 8*8 to 16 *16 or 32*32.
The following figure illustrates how a given image is devided into cells and blocks:
- The histogram generated, is a magnitude versus orientation histogram. The orientation is from
0-180 degrees and is divided into 9 bins of 20 degrees each. The pixels having an orientation
ranging from each bin limits are found, and the summation of their magnitude is calculated.
- The contribution of a pixel’s gradient to the bins on either side of the pixel gradient can be
also taken into consideration. The higher contribution should be to the bin closer to the
orientation.
Although HOG features are found for the 8*8 cells of the image, some portion of this latter would
be very bright compared to other portions due to lightning. One way to reduce this is by normalizing
the gradients by taking 16×16 blocks.
We will end up having four 9*1 matrices for a single block. This 36-elements vector is normalized by
dividing it by the square root of the sum of squares of the element’s values.
A given vector is of the following form:
And divide all the values in the vector V by the value k found above.
For a 64*128 image, we would have 105 16*16 blocks, each block is 36-vector of features. Therefore,
the total features for the image would be 105 * 36 = 3780 features.
- In the second part of the process, this model allows the decision. If the tested vector is on the
side of the hyperplane relative to the positive examples, then it is an element of the class.
Otherwise, it is not an element of the class.
The following flowchart describes the different steps that are going to be used throughout this project:
HOG feature extraction and SVM classification.
Results and discussion
1- Task 01
A MATLAB program implement the Histogram of Oriented Gradients (HOG) for feature
extraction is written. The features of both learning and testing images are extracted. A script file is
written in which the function is called to find to features.
Here is an example of one of the training data set images with its extracted feature:
2- Task 02
In this task, a classifier is trained using the training image dataset. The MATLAB function
“Fitcsvm” is used, as follows:
The labels of the images are changed to 'No_pedestrian' and 'Pedestrian' to illustrate exactly what the
images represent.
To test the classifier, “predict” MATLAB function is used having the trained classifier output along
with the testing images labels as inputs.
- The results showed good accuracy in recognizing images where pedestrians are
present. To illustrate the results, the confusion matrix is computed then displayed. It is
shown in the following figure.
Confusion matrix
- We can see from the figure that out of 500 no_pedestrian images 496 are recognized as such,
while 4 were detected as being a pedestrian image.
- 483 of 500 images illustrating pedestrians were recognized as such, 17 predictions were wrong.
• The accuracy can be calculated from the above confusion matrix:
𝟒𝟗𝟔+𝟒𝟖𝟑
Accuracy= 𝟏𝟎𝟎𝟎
= 𝟗𝟕. 𝟗𝟎%
Or can be found by finding how many predicted labels correspond to the actual data set labels, dividing
the number over the total number of testing images. This was done on MATLAB as:
testingLabels=test_Set.Labels;
n=size(testingLabels);
Accuracy=sum(predicted_labels(:,1) == testingLabels)/n(1);
• The false positive rate corresponds to the number of times an image is classified as having a
pedestrian while it does not actually.
𝟒
FBR= 𝟓𝟎𝟎 = 𝟎. 𝟖%
• The miss rate also can be computed using the result of the confusion matrix:
Miss_Rate=2.10%
The complete code written can be found attached under the name of SVM_Achour.m
Conclusion
HOG features were extracted then classified though this project, the algorithm gave good results
in extracting different image features. The feature when used to train and test a classifier gave accurate
for image classification. Nevertheless, the simulation took some time because of feature vector length
of each image. If we use large image regions, extremely large feature vectors are computed, but again
we can either resize our images to a reasonable size or pay attention to HOG parameters (cells, blocks
…).
SVMs classifiers are quick to be trained because they only use a subset of a dataset as training
data, so for relatively small data set, they are the best to use for classification. ANN methods might be
considered when there is a large number of training instances.