0% found this document useful (0 votes)
7 views6 pages

M4 Ia2

This document discusses Convolutional Neural Networks (CNNs), focusing on the convolution operation, its motivation, and the pooling process. CNNs are effective for image recognition and classification, utilizing sparse interactions, parameter sharing, and equivariance to enhance efficiency. The document also highlights the importance of pooling in reducing spatial dimensions, controlling overfitting, and maintaining translation invariance in feature detection.

Uploaded by

fijaxas499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views6 pages

M4 Ia2

This document discusses Convolutional Neural Networks (CNNs), focusing on the convolution operation, its motivation, and the pooling process. CNNs are effective for image recognition and classification, utilizing sparse interactions, parameter sharing, and equivariance to enhance efficiency. The document also highlights the importance of pooling in reducing spatial dimensions, controlling overfitting, and maintaining translation invariance in feature detection.

Uploaded by

fijaxas499
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Module 4

Convolutional Networks: The Convolution Operation, Motivation, Pooling, Convolution and Pooling as an
Infinitely Strong Prior, Variants of the Basic Convolution Function, Structured Outputs, Data Types,
Efficient Convolution Algorithms, Random or Unsupervised Features.

What is CNN?
Convolutional Neural Networks (ConvNets or CNNs) are a category of neural networks that have proven very
effective in areas such as image recognition and classification. ConvNets have been successful in identifying
faces, objects and traffic signs apart from powering vision in robots and self-driving cars. ConvNets, therefore,
are an important tool for most machine learning practitioners today

9.1 The Convolution Operation


In its most general form, convolution is an operation on two functions of a realvalued argument. To motivate
the definition of convolution, we start with examples of two functions we might use. Suppose we are tracking
the location of a spaceship with a laser sensor. Our laser sensor provides a single output x(t), the position of
the spaceship at time t. Both x and t are real-valued, i.e., we can get a different reading from the laser sensor
at any instant in time.
Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy estimate of the spaceship’s
position, we would like to average together several measurements. Of course, more recent measurements are
more relevant, so we will want this to be a weighted average that gives more weight to recent measurements.
We can do this with a weighting function w(a), where a is the age of a measurement. If we apply such a
weighted average operation at every moment, we obtain a new function providing s a smoothed estimate of
the position of the spaceship:

This operation is called convolution. The convolution operation is typically denoted with an asterisk:

In convolutional network terminology, the first argument (in this example, the function x) to the convolution
is often referred to as the input and the second argument (in this example, the function w) as the kernel. The
output is sometimes referred to as the feature map.
In our example, the idea of a laser sensor that can provide measurements at every instant in time is not
realistic. Usually, when we work with data on a computer, time will be discretized, and our sensor will provide
data at regular intervals. In our example, it might be more realistic to assume that our laser provides a
measurement once per second.

For example, if we use a two-dimensional image I as our input, we probably also want to use a two-dimensional
kernel K:
9.2 Motivation
Motivation for Convolution in Machine Learning
Convolution leverages sparse interactions, parameter sharing and equivariant representations, enhancing
efficiency in machine learning systems. Convolutional layers can handle variable-sized inputs.
1. Sparse Interactions (Sparse Connectivity):
o Sparse interactions mean that each output unit interacts with only a small subset of input units,
rather than all.
o Traditional neural networks use dense connections, where every input interacts with every output,
increasing memory and computation needs.
o Convolutional networks achieve sparse connectivity by using small kernels (e.g., for detecting edges
in images), which limits connections and reduces parameter storage.
o Fewer connections reduce memory requirements and computation time, making convolution
efficient for large inputs like images.
2. Parameter Sharing:
o Parameter sharing means that the same parameters (weights) are reused across different parts of
the input.
o In traditional networks, each parameter in the weight matrix is used once per output calculation.
Convolutional networks, however, apply the same kernel parameters across different input
locations.
o Parameter sharing reduces storage needs, as only a small set of parameters (kernel size, k) is
required, not a full set for each location.
o This leads to efficient forward propagation without increasing runtime complexity, as the memory
requirements are minimized.

3. Equivariance to Translation:
o Equivariance means that when an input is transformed (e.g., shifted), the output changes in a
predictable way, typically mirroring the input shift.
o Convolution is translation-equivariant, meaning that shifting the input results in an equivalent shift
in the output.
o For time series data, this creates a timeline of feature occurrences; for images, it creates a 2-D map
showing where features appear.
o This property is useful for detecting recurring patterns (e.g., edges) across the input in a consistent
manner.
Combining sparse connectivity and parameter sharing, convolutional networks efficiently detect edges and
other features across an image.
Handling varied sized data – Convolution allows for processing data of varying sizes, which traditional fixed-
shape matrices cannot handle efficiently.
Limitations:
ₓ Convolution is not naturally equivariant to transformations like scaling or rotation, requiring other
techniques to handle these.
ₓ Parameter sharing may not be ideal in cases where specific regions of the input (like different parts of a
face) need distinct features.

9.3 Pooling
A convolutional layer typically has three stages:
 Convolution Stage: Applies multiple convolutions in parallel to produce a set of linear activations.
 Detector Stage: Each linear activation undergoes a nonlinear function (like ReLU).
 Pooling Stage: Applies a pooling function to summarize nearby outputs.

o Complex Terminology: Each convolutional layer has multiple stages (convolution, detector,
pooling).
o Simple Terminology: Each stage in the process is treated as its own layer (e.g., convolution
layer, detector layer, pooling layer).
Pooling in the context of convolutional neural networks (CNNs) is a down-sampling operation that reduces
the spatial dimensions of feature maps (i.e., the width and height), which helps control overfitting, reduces
the computational load, and makes the model more translation invariant. Pooling essentially replaces the
output at a given location with a summary statistic from its neighboring locations, thereby condensing the
information.
Pooling replaces outputs at specific locations with a summary statistic from nearby values.
Types of Pooling:
 Max Pooling: Captures the maximum value in a specified neighbourhood.
 Average Pooling: Takes the average within a neighbourhood.
 L2 Norm Pooling: Uses the L2 norm of values within the region.
 Weighted Average: Averages values with weights based on proximity to the central pixel.\

Efficiency of Edge Detection with Convolution


Convolution is computationally efficient, making operations like edge detection feasible without extensive
computations.
Example: An edge-detection kernel can significantly reduce floating-point operations compared to matrix
multiplication, performing the same transformation using fewer resources.

Importance and Uses of Pooling in Convolutional Neural Networks


1. Translation Invariance
Pooling introduces invariance to small translations in the input, which is beneficial in applications
where the exact location of a feature is less important. For example, in face detection, we care that
there are eyes roughly on the left and right of the face, not their exact pixel locations. This invariance
allows the network to generalize better when detecting objects or patterns in various locations.

2. Pooling as a Prior for Invariance


By using pooling, the network essentially learns a strong prior that the features should be invariant
to small translations. This enforced property can improve the efficiency of the network by focusing
on the presence of features, even if they shift slightly. This prior is especially useful when tasks
require detecting objects in different positions within an image.
3. Pooling Across Spatial Regions
Pooling reduces the number of output values, which in turn decreases computational and memory
demands for later layers. For instance, spacing pooling regions by kkk pixels apart (instead of every
pixel) reduces the number of outputs by a factor of kkk, making the network more efficient. This
strategy is particularly helpful in handling large inputs.
4. Handling Variable Input Sizes
Pooling is essential for processing inputs of varying sizes by producing a consistent output size. In
image classification, for example, pooling can summarize information in each quadrant of an image,
maintaining a fixed-size output even if the input images vary in dimensions. This capability enables
CNNs to adapt more flexibly to diverse inputs without altering network architecture.
5. Downsampling with Pooling
Pooling inherently downsamples the input, which reduces the size of the representation and lowers
the computational load in the following layers. Downsampling helps reduce network complexity and
focuses the model’s attention on the most critical features.

6. Advanced Pooling Strategies


Adaptive pooling techniques, such as using clustering algorithms or learning pooling regions
dynamically, allow pooling regions to adjust based on the feature location. These advanced strategies
make CNNs more flexible, enabling them to adapt pooling regions to suit the specific features of each
image. This customization improves network accuracy and performance on varied inputs.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy