0% found this document useful (0 votes)
6 views8 pages

Introduction To Pooling Layer

The document provides an overview of pooling layers in convolutional neural networks (CNNs), explaining their purpose in reducing feature map dimensions and enhancing model robustness. It discusses various types of pooling, including max pooling, average pooling, and global pooling, along with their advantages and disadvantages. Additionally, it covers the concept of padding, its significance in preserving spatial dimensions during convolution, and the differences between valid and same padding techniques.

Uploaded by

Jitendra Harbola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views8 pages

Introduction To Pooling Layer

The document provides an overview of pooling layers in convolutional neural networks (CNNs), explaining their purpose in reducing feature map dimensions and enhancing model robustness. It discusses various types of pooling, including max pooling, average pooling, and global pooling, along with their advantages and disadvantages. Additionally, it covers the concept of padding, its significance in preserving spatial dimensions during convolution, and the differences between valid and same padding techniques.

Uploaded by

Jitendra Harbola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Introduction to Pooling Layer

The pooling operation involves sliding a two-dimensional filter over


each channel of feature map and summarising the features lying
within the region covered by the filter. For a feature map having
dimensions nh x nw x nc, the dimensions of output obtained after a
pooling layer is

(nh - f + 1) / s x (nw - f + 1)/s x nc

where,
-> nh - height of feature map
-> nw - width of feature map
-> nc - number of channels in the feature map
-> f - size of filter
-> s - stride length

A common CNN model architecture is to have a number of


convolution and pooling layers stacked one after the other.
Why to use Pooling Layers?
 Pooling layers are used to reduce the dimensions of the
feature maps. Thus, it reduces the number of parameters to
learn and the amount of computation performed in the
network.
 The pooling layer summarises the features present in a
region of the feature map generated by a convolution layer.
So, further operations are performed on summarised
features instead of precisely positioned features generated
by the convolution layer. This makes the model more
robust to variations in the position of the features in the
input image.

Types of Pooling(Subsampling or Downsampling) Layers:


Max Pooling
1. Max pooling is a pooling operation that selects the
maximum element from the region of the feature map
covered by the filter. Thus, the output after max-pooling
layer would be a feature map containing the most
prominent features of the previous feature map.

Average Pooling
1. Average pooling computes the average of the elements
present in the region of feature map covered by the filter.
Thus, while max pooling gives the most prominent feature
in a particular patch of the feature map, average pooling
gives the average of features present in a patch.

Global Pooling
1. Global pooling reduces each channel in the feature map to
a single value. Thus, an nh x nw x nc feature map is reduced
to 1 x 1 x nc feature map. This is equivalent to using a filter
of dimensions nh x nw i.e. the dimensions of the feature
map.
Further, it can be either global max pooling or global
average pooling
In convolutional neural networks (CNNs), the pooling layer is a
common type of layer that is typically added after convolutional
layers. The pooling layer is used to reduce the spatial dimensions
(i.e., the width and height) of the feature maps, while preserving
the depth (i.e., the number of channels).
1. The pooling layer works by dividing the input feature map
into a set of non-overlapping regions, called pooling
regions. Each pooling region is then transformed into a
single output value, which represents the presence of a
particular feature in that region. The most common types of
pooling operations are max pooling and average pooling.
2. In max pooling, the output value for each pooling region is
simply the maximum value of the input values within that
region. This has the effect of preserving the most salient
features in each pooling region, while discarding less
relevant information. Max pooling is often used in CNNs for
object recognition tasks, as it helps to identify the most
distinctive features of an object, such as its edges and
corners.
3. In average pooling, the output value for each pooling region
is the average of the input values within that region. This
has the effect of preserving more information than max
pooling, but may also dilute the most salient features.
Average pooling is often used in CNNs for tasks such as
image segmentation and object detection, where a more
fine-grained representation of the input is required.
Pooling layers are typically used in conjunction with convolutional
layers in a CNN, with each pooling layer reducing the spatial
dimensions of the feature maps, while the convolutional layers
extract increasingly complex features from the input. The resulting
feature maps are then passed to a fully connected layer, which
performs the final classification or regression task.
Advantages of Pooling Layer:
1. Dimensionality reduction: The main advantage of pooling
layers is that they help in reducing the spatial dimensions
of the feature maps. This reduces the computational cost
and also helps in avoiding overfitting by reducing the
number of parameters in the model.
2. Translation invariance: Pooling layers are also useful in
achieving translation invariance in the feature maps. This
means that the position of an object in the image does not
affect the classification result, as the same features are
detected regardless of the position of the object.
3. Feature selection: Pooling layers can also help in selecting
the most important features from the input, as max pooling
selects the most salient features and average pooling
preserves more information.
Disadvantages of Pooling Layer:
1. Information loss: One of the main disadvantages of pooling
layers is that they discard some information from the input
feature maps, which can be important for the final
classification or regression task.
2. Over-smoothing: Pooling layers can also cause over-
smoothing of the feature maps, which can result in the loss
of some fine-grained details that are important for the final
classification or regression task.
3. Hyperparameter tuning: Pooling layers also introduce
hyperparameters such as the size of the pooling regions
and the stride, which need to be tuned in order to achieve
optimal performance. This can be time-consuming and
requires some expertise in model building.

Introduction to Padding


During convolution, the size of the output feature map is determined


by the size of the input feature map, the size of the kernel, and the
stride. if we simply apply the kernel on the input feature map, then
the output feature map will be smaller than the input. This can
result in the loss of information at the borders of the input feature
map. In Order to preserve the border information we use padding.
What Is Padding
padding is a technique used to preserve the spatial dimensions of
the input image after convolution operations on a feature map.
Padding involves adding extra pixels around the border of the input
feature map before convolution.
This can be done in two ways:
 Valid Padding: In the valid padding, no padding is added to
the input feature map, and the output feature map is
smaller than the input feature map. This is useful when we
want to reduce the spatial dimensions of the feature maps.
 Same Padding: In the same padding, padding is added to
the input feature map such that the size of the output
feature map is the same as the input feature map. This is
useful when we want to preserve the spatial dimensions of
the feature maps.
The number of pixels to be added for padding can be calculated
based on the size of the kernel and the desired output of the feature
map size. The most common padding value is zero-padding, which
involves adding zeros to the borders of the input feature map.

Padding can help in reducing the loss of information at the borders


of the input feature map and can improve the performance of the
model. However, it also increases the computational cost of the
convolution operation. Overall, padding is an important technique
in CNNs that helps in preserving the spatial dimensions of the
feature maps and can improve the performance of the model.
Problem With Convolution Layers Without Padding
 For a grayscale (n x n) image and (f x f) filter/kernel, the
dimensions of the image resulting from a convolution
operation is (n – f + 1) x (n – f + 1). For example, for an
(8 x 8) image and (3 x 3) filter, the output resulting after
the convolution operation would be of size (6 x 6). Thus, the
image shrinks every time a convolution operation is
performed. This places an upper limit to the number of
times such an operation could be performed before the
image reduces to nothing thereby precluding us from
building deeper networks.
 Also, the pixels on the corners and the edges are used
much less than those in the middle. For example,
 Clearly, pixel A is touched in just one convolution operation
and pixel B is touched in 3 convolution operations, while
pixel C is touched in 9 convolution operations. In general,
pixels in the middle are used more often than pixels on
corners and edges. Consequently, the information on the
borders of images is not preserved as well as the
information in the middle.
Effect Of Padding On Input Images
Padding is simply a process of adding layers of zeros to our input
images so as to avoid the problems mentioned above through the
following changes to the input image.
Padding prevents the shrinking of the input image.
p = number of layers of zeros added to the border of the image,
then (n x n) image —> (n + 2p) x (n + 2p) image after
padding.
(n + 2p) x (n + 2p) * (f x f) —–> outputs (n + 2p – f + 1) x
(n + 2p – f + 1) images
For example, by adding one layer of padding to an (8 x 8) image
and using a (3 x 3) filter we would get an (8 x 8) output after
performing a convolution operation.
This increases the contribution of the pixels at the border of the
original image by bringing them into the middle of the padded
image. Thus, information on the borders is preserved as well as the
information in the middle of the image.
Types of Padding
Valid Padding: It implies no padding at all. The input image is left
in its valid/unaltered shape. So

where, nxn is the dimension of input image


fxf is kernel size
n-f+1 is output image size
* represents a convolution operation.
Same Padding: In this case, we add ‘p’ padding layers such that
the output image has the same dimensions as the input image.
So,

[(n + 2p) x (n + 2p) image] * [(f x f) filter] —> [(n x n)


image]

which gives p = (f – 1) / 2 (because n + 2p – f + 1 = n).

So, if we use a (3 x 3) filter on an input image to get the output


with the same dimensions. the 1 layer of zeros must be added to
the borders for the same padding. Similarly, if (5 x 5) filter is used
2 layers of zeros must be appended to the border of the image.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy