Aiml Ece Unit-5
Aiml Ece Unit-5
• Reference:
• Deep Learning, Ian Goodfellow, Yoshua Benjio, Aaron Courville, The MIT
Press, 2016
Convolutional Networks
• Convolutional networks also known as convolutional neural networks
or CNNs.
• CNNs are specialized kind of neural networks for processing data that
has a known, grid-like topology.
• Examples:
• time-series data, which can be thought of as a 1D grid taking samples at
regular time intervals
• image data, which can be thought of as a 2D grid of pixels.
• If ‘x’ and ‘w’ are defined only on time index ‘t’, we can
define the discrete convolution:
• If we use convolutions over more than one axis at a time,
• For example, if we use a two-dimensional image ‘I’ as our input, we
also want to use a two-dimensional kernel ‘K’:
• Any neural network algorithm that works with matrix multiplication and
does not depend on specific properties of the matrix structure should
work with convolution, without requiring any further changes to the
neural network.
• Convolution leverages three important ideas that can help improve a machine
learning system:
• sparse interactions
• parameter sharing
• equivariant representations
• Convolutional Networks:
• Use sparse interactions (sparse connectivity/weights).
• Small kernels scan local regions (e.g., edges in images).
• Advantages:
• Fewer parameters to store.
• Reduced memory and computational requirements.
• Improved statistical efficiency.
• Efficiency:
• Parameters: k×n (where k≪m)
• Runtime: O(k×n)
• For graphical demonstrations of sparse connectivity, see below figures:
• Parameter sharing
• refers to using the same parameter for more than one
function in a model.
• With images,
• convolution creates a 2-D map of where certain features appear in the input.
• If we move the object in the input, its representation will move the same amount in the
output.
• In some cases, we may NOT wish to share parameters across the entire image.
• For example, if we are processing images that are cropped to be centered on an
individual’s face, we probably want to extract different features at different locations—
the part of the network processing the top of the face needs to look for eyebrows, while
the part of the network processing the bottom of the face needs to look for a chin.
Pooling
• A typical layer of a convolutional network consists of three stages (Figure below ).
• In the first stage, the layer performs several convolutions in parallel to produce a set of linear activations.
• In the second stage (detector stage), each linear activation is run through a nonlinear activation function
(ex: ReLU).
• In the third stage, we use a pooling function to modify the output of the layer further.
• A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby
outputs.
• For example, the max pooling operation reports the maximum output within a rectangular neighborhood.
• Pooling helps to make the representation approximately invariant to small translations of the input.
• Invariance to translation means that if we translate the input by a small amount, the values of most of the
pooled outputs do NOT change.
• See the figure above for an example of how this works.
• Convolutional layers
• Activation layer (Rectified Linear Unit)
• Pooling layers
• Fully connected layers (Dense layers)
Example: Architecture of the CNNs applied to
digit recognition
Convolution layers
• For example,
• one filter might be good at finding straight lines,
another might find curves, and so on.
• Let’s consider this 32x32 grayscale image (0-Black to 255- White) of a handwritten
digit.
• Perform the convolution operation by
applying the dot product, and work as follows: