0% found this document useful (0 votes)
9 views48 pages

Aiml Ece Unit-5

so basically just cnn in a jist

Uploaded by

Sai Loukik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views48 pages

Aiml Ece Unit-5

so basically just cnn in a jist

Uploaded by

Sai Loukik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 48

UNIT-V

Introduction to Deep learning


UNIT-V
Introduction to Deep learning
• Analyze the key computations underlying deep learning,

• Convolutional Neural Network, Building blocks of CNN-


Convolutional layers, Pooling layers, Dense layers,

• Case study using Jetson Nano board.

• Reference:
• Deep Learning, Ian Goodfellow, Yoshua Benjio, Aaron Courville, The MIT
Press, 2016
Convolutional Networks
• Convolutional networks also known as convolutional neural networks
or CNNs.

• CNNs are specialized kind of neural networks for processing data that
has a known, grid-like topology.
• Examples:
• time-series data, which can be thought of as a 1D grid taking samples at
regular time intervals
• image data, which can be thought of as a 2D grid of pixels.

• CNN uses a convolution operation


• Convolution is a specialized kind of linear operation.
• CNN use convolution in place of general matrix multiplication in at least one
of their layers.

• The operation used in a CNN does not correspond precisely to the


definition of convolution as used in other fields such as engineering or
pure mathematics.
Convolution Operation
• Suppose we are tracking the location of a spaceship with a laser sensor.
• Our laser sensor provides a single output x(t), the position of the spaceship at time t.
• Both x and t are real-valued, i.e., we can get a different reading from the laser sensor
at any instant in time.

• To obtain a less noisy estimate of the spaceship’s position, we would like to


average together several measurements.
• so, we will want this to be a weighted average that gives more weight to recent
measurements.
• We can do this with a weighting function w(a), where ‘a’ is the age of a measurement.

• If we apply such a weighted average operation at every moment, we obtain a


new ‘s’ function providing a smoothed estimate of the position of the
spaceship:

• This operation is called convolution.


• The convolution operation is typically denoted with an
asterisk:

• function ‘x’ referred to as input


• function ‘w’ as the kernel
• Output is referred as feature map

• If ‘x’ and ‘w’ are defined only on time index ‘t’, we can
define the discrete convolution:
• If we use convolutions over more than one axis at a time,
• For example, if we use a two-dimensional image ‘I’ as our input, we
also want to use a two-dimensional kernel ‘K’:

• The commutative property of convolution arises because we have


flipped the kernel relative to the input.

• The only reason to flip the kernel is to obtain the commutative


property.

• The commutative property is useful for writing proofs, it is NOT an


important property of a neural network implementation.
• Instead, many neural network libraries implement a
related function called cross-correlation, which is the
same as convolution but without flipping the kernel:

• Many machine learning libraries implement cross-


correlation but call it convolution.

• We will follow this convention of calling both operations


convolution, and specify whether we mean to flip the
kernel or not in contexts where kernel flipping is relevant.
• Figure below presents an example of convolution (without kernel flipping) applied
to a 2-D tensor.
• Discrete convolution can be viewed as multiplication by a matrix.
• For example, for univariate discrete convolution,
• each row of the matrix is constrained to be equal to the row above shifted by
one element.
• This is known as a Toeplitz matrix.

• In two dimensions, a doubly block circulant matrix corresponds to


convolution.
• In addition to these constraints that several elements be equal to each
other, convolution usually corresponds to a very sparse matrix (a matrix
whose entries are mostly equal to zero).
• This is because the kernel is usually much smaller than the input image.

• Convolution works with inputs of variable size.

• Any neural network algorithm that works with matrix multiplication and
does not depend on specific properties of the matrix structure should
work with convolution, without requiring any further changes to the
neural network.
• Convolution leverages three important ideas that can help improve a machine
learning system:
• sparse interactions
• parameter sharing
• equivariant representations

• Sparse interactions (sparse connectivity or sparse weights)


• Traditional Neural Networks:
• Use dense matrix multiplication.
• Every output unit interacts with every input unit.
• High memory and computational costs:
• Parameters: m×n (‘m’ inputs and ‘n’ outputs)
• Runtime (per example): O(m×n)

• Convolutional Networks:
• Use sparse interactions (sparse connectivity/weights).
• Small kernels scan local regions (e.g., edges in images).
• Advantages:
• Fewer parameters to store.
• Reduced memory and computational requirements.
• Improved statistical efficiency.
• Efficiency:
• Parameters: k×n (where k≪m)
• Runtime: O(k×n)
• For graphical demonstrations of sparse connectivity, see below figures:
• Parameter sharing
• refers to using the same parameter for more than one
function in a model.

• In a traditional neural net,


• each element of the weight matrix is used exactly once when
computing the output of a layer.
• It is multiplied by one element of the input and then never revisited.

• In a convolutional neural net,


• each member of the kernel is used at every position of the input.
• The parameter sharing used by the convolution operation means that
rather than learning a separate set of parameters for every location,
we learn only one set.
• This does NOT affect the runtime of forward propagation—
it is still O(k x n)—but it does further reduce the storage requirements of
the model to k parameters.
• As an example of both of these first two principles in action, the figure shows how sparse
connectivity and parameter sharing can dramatically improve the efficiency of a linear function
for detecting edges in an image.
• Equivariance representation

• To say a function is equivariant means that if the input changes, the


output changes in the same way.

• A function f(x) is equivariant to a function g if f(g(x)) = g(f(x)).

• In the case of convolution, if we let g be any function that translates the


input, i.e., shifts it, then the convolution function is equivariant to g.

• For example, let I be a function giving image brightness at integer


coordinates.
• Let g be a function mapping one image function to another image
function, such that I’=g(I) is the image function with I’(x,y)=I(x−1,y).

• This shifts every pixel of I one unit to the right.

• If we apply this transformation to I, and then apply convolution, the


result will be the same as if we applied convolution to I’, and then
applied the transformation ‘g’ to the output.
• When processing time series data,
• the convolution produces a sort of timeline that shows when different features appear in
the input.
• If we move an event later in time in the input, the exact same representation of it will
appear in the output, just later in time.

• With images,
• convolution creates a 2-D map of where certain features appear in the input.
• If we move the object in the input, its representation will move the same amount in the
output.

• This is useful when we know that some function of a small number of


neighboring pixels is useful when applied to multiple input locations.
• For example, when processing images, it is useful to detect edges in the first layer of a
convolutional network.
• The same edges appear more or less everywhere in the image, so it is practical to share
parameters across the entire image.

• In some cases, we may NOT wish to share parameters across the entire image.
• For example, if we are processing images that are cropped to be centered on an
individual’s face, we probably want to extract different features at different locations—
the part of the network processing the top of the face needs to look for eyebrows, while
the part of the network processing the bottom of the face needs to look for a chin.
Pooling
• A typical layer of a convolutional network consists of three stages (Figure below ).
• In the first stage, the layer performs several convolutions in parallel to produce a set of linear activations.
• In the second stage (detector stage), each linear activation is run through a nonlinear activation function
(ex: ReLU).
• In the third stage, we use a pooling function to modify the output of the layer further.
• A pooling function replaces the output of the net at a certain location with a summary statistic of the nearby
outputs.
• For example, the max pooling operation reports the maximum output within a rectangular neighborhood.

• Pooling helps to make the representation approximately invariant to small translations of the input.
• Invariance to translation means that if we translate the input by a small amount, the values of most of the
pooled outputs do NOT change.
• See the figure above for an example of how this works.

• Invariance to local translation can be a very useful property if


we care more about whether some feature is present than
exactly where it is.
• For example,
• when determining whether an image contains a face, we need NOT
know the location of the eyes with pixel-perfect accuracy, we just
need to know that there is an eye on the left side of the face and an
eye on the right side of the face.
• For example,
• if we want to find a corner defined by two edges meeting at a specific
orientation, we need to preserve the location of the edges well
enough to test whether they meet.
• The use of pooling can be viewed as adding an infinitely strong prior
that the function the layer learns must be invariant to small
translations.
• When this assumption is correct, it can greatly improve the statistical
• Pooling over spatial regions produces invariance to translation, but if we pool over the
outputs of separately parametrized convolutions, the features can learn which
transformations to become invariant to (see figure below).
• Because pooling summarizes the responses over a whole neighborhood, it is possible to use
fewer pooling units than detector units, by reporting summary statistics for pooling regions
spaced k pixels apart rather than 1 pixel apart.

• Example (Figure below).


• This improves the computational efficiency of the network because the next layer has roughly k times
fewer inputs to process.
• When the number of parameters in the next layer is a function of its input size (such as when the next
layer is fully connected and based on matrix multiplication) this reduction in the input size can also
result in improved statistical efficiency and reduced memory requirements for storing the
parameters.
• Some examples of complete convolutional network architectures for classification using convolution and pooling are shown in the figure below:
The importance of CNNs
• CNNs are distinguished from classic machine learning
algorithms such as SVMs and decision trees by their ability to
autonomously extract features at a large scale, bypassing the
need for manual feature engineering and thereby enhancing
efficiency.

• The convolutional layers grant CNNs their translation-invariant


characteristics, empowering them to identify and extract
patterns and features from data irrespective of variations in
position, orientation, scale, or translation.

• A variety of pre-trained CNN architectures, including VGG-16,


ResNet50, Inceptionv3, and EfficientNet, have demonstrated top-
tier performance. These models can be adapted to new tasks
with relatively little data through a process known as fine-
tuning.

• Beyond image classification tasks, CNNs can be applied to other


domains,
An Introduction tosuch as Neural
Convolutional natural
Networks:language processing,
A Comprehensive Guide time| DataCamp
to CNNs in Deep Learning series
Key Components of a CNN

• Convolutional layers
• Activation layer (Rectified Linear Unit)
• Pooling layers
• Fully connected layers (Dense layers)
Example: Architecture of the CNNs applied to
digit recognition
Convolution layers

• Convolution is the application of a sliding window


function to a matrix of pixels representing an
image.

• The sliding function applied to the matrix is called


kernel or filter.

• Several filters of equal size are applied, and each


filter is used to recognize a specific pattern from
the image (e.g., curving of the digits, edges, whole
shape of the digits).

• For example,
• one filter might be good at finding straight lines,
another might find curves, and so on.
• Let’s consider this 32x32 grayscale image (0-Black to 255- White) of a handwritten
digit.
• Perform the convolution operation by
applying the dot product, and work as follows:

1. Apply the kernel matrix from the top-left corner


to the right.
2. Perform element-wise multiplication.
3. Sum the values of the products.
4. The resulting value corresponds to the first
value (top-left corner) in the convoluted matrix.
5. Move the kernel down with respect to the size
of the sliding window.
6. Repeat steps 1 to 5 until the image matrix is fully
covered.
• The dimension of the convoluted matrix depends on the
size of the sliding window.
• The higher the sliding window, the smaller the
dimension.

• Another name associated with the kernel in the literature


is the feature detector because the weights can be fine-
tuned to detect specific features in the input image.
• For instance:
• Averaging neighboring pixels kernel can be used to blur the
input image.
• Subtracting neighboring kernel is used to perform edge
detection.

• The more convolution layers the network has, the better


the layer is at detecting more abstract features.
• Activation function

• A Rectified Linear Unit (ReLU ) activation


function is applied after each convolution
operation.

• This function helps the network learn non-linear


relationships between the features in the image,
hence making the network more robust for
identifying different patterns.

• It also helps to mitigate the vanishing gradient


problems.
• Pooling layer

• The goal of the pooling layer is to pull the most significant


features from the convoluted matrix.
• This is done by applying some aggregation operations,
which reduce the dimension of the feature map
(convoluted matrix), hence reducing the memory used while
training the network.
• Pooling is also relevant for mitigating overfitting.

• The most common aggregation functions that can be


applied are:
• Max pooling, which is the maximum value of the feature
map
• Sum pooling corresponds to the sum of all the values of the
feature map
• Average pooling is the average of all the values.

• The last pooling layer flattens its feature map so


that it can be processed by the fully connected layer.
• Fully connected layers

• These layers are in the last layer of the convolutional neural


network, and their inputs correspond to the flattened one-
dimensional matrix generated by the last pooling layer.

• ReLU activations functions are applied to them for non-linearity.

• Finally, a softmax prediction layer is used to generate


probability values for each of the possible output labels, and
the final label predicted is the one with the highest probability score.
Example
https://youtu.be/Y1qxI-Df4Lk?si=mPbNJvO5iglUvJ4z
If Stride = 2
Padding = 1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy