Convolution Operation
Convolution Operation
The convolution is a mathematical operation used to extract features from an image. The convolution
is defined by an image kernel. The image kernel is nothing more than a small matrix. Most of the
time, a 3x3 kernel matrix is very common.
In the below fig, the green matrix is the original image and the yellow moving matrix is called kernel,
which is used to learn the different features of the original image. The kernel first moves horizontally,
then shift down and again moves horizontally. The sum of the dot product of the image pixel value
and kernel pixel value gives the output matrix. Initially, the kernel value initializes randomly, and its a
learning parameter.
In order to understand the concept of edge detection, taking an example of a simplified image.
Padding
What Is Padding
padding is a technique used to preserve the spatial dimensions of the input image after
convolution operations on a feature map. Padding involves adding extra pixels around the border
of the input feature map before convolution.
Valid Padding: In the valid padding, no padding is added to the input feature map, and the
output feature map is smaller than the input feature map. This is useful when we want to
reduce the spatial dimensions of the feature maps.
Same Padding: In the same padding, padding is added to the input feature map such that
the size of the output feature map is the same as the input feature map. This is useful
when we want to preserve the spatial dimensions of the feature maps.
So, to solve these two issues, a new concept called padding was introduced. Padding preserves the
size of the original image.
So if a 𝑛∗𝑛 matrix convolved with an f*f matrix the with padding p then the size of the output image
will be (n + 2p — f + 1) * (n + 2p — f + 1) where p =1 in this case.
Stride
left image: stride =0, middle image: stride = 1, right image: stride =2
Stride is the number of pixels shifts over the input matrix. For padding p, filter size 𝑓∗𝑓 and input
image size 𝑛 ∗ 𝑛 and stride ‘𝑠’ our output image dimension will be [ {(𝑛 + 2𝑝 − 𝑓 + 1) / 𝑠} + 1] ∗
[ {(𝑛 + 2𝑝 − 𝑓 + 1) / 𝑠} + 1].