Sarma CNN Vce Oct 2022
Sarma CNN Vce Oct 2022
By
Dr. T. Hitendra Sarma
Associate Professor
Department of IT
Vasavi College of Engineering
Hyderabad
Introduction
What is convolution?
Motivation behind using convolution in a neural network.
What is pooling?
How convolution may be applied to many kinds of data, with different numbers of
dimensions.
Means of making convolution more efficient.
Objective
The goal of this talk is to describe the kinds of tools that convolutional networks
provide.
The general guidelines for choosing which tools to use in which circumstances will
be discussed in next session.
The building blocks
The Convolution Operation:
Suppose we are tracking the location of a spaceship with a laser sensor. Our laser
sensor provides a single output x(t), the position of the spaceship at time t. Both x
and t are real-valued, i.e., we can get a different reading from the laser sensor at any
instant in time.
Now suppose that our laser sensor is somewhat noisy. To obtain a less noisy
estimate of the spaceship’s position, we would like to average together several
measurements.
Of course, more recent measurements are more relevant, so we will want this to be
a weighted average that gives more weight to recent measurements. We can do this
with a weighting function w(a), where a is the age of a measurement.
Convolution…
If we apply such a weighted average operation at every moment, we obtain a new
function s providing a smoothed estimate of the position of the spaceship:
s(t) = 𝑥 𝑎 𝑤(𝑡 − 𝑎) da
This operation is called convolution.
The convolution operation is typically denoted with an asterisk: s(t) = (x ∗ w)(t)
In general, convolution is defined for any functions for which the above integral is
defined.
Some Terminology
In convolutional network terminology, the first argument (in the previous example,
the function x) to the convolution is often referred to as the input and the second
argument (in this example, the function w) as the kernel.
The output is sometimes referred to as the feature map.
It might be more realistic to assume that our laser provides a measurement once per
second. The time index t can then take on only integer values. If we now assume that
x and w are defined only on integer t, we can define the discrete convolution:
∞
𝑠 𝑡 = 𝑥∗𝑤 𝑡 = 𝑥 𝑎 𝑤(𝑡 − 𝑎)
−∞
Tensors!
Note that
(𝐾 ∗ 𝐼)(𝑖, 𝑗) = 𝐼 ∗ 𝐾 𝑖, 𝑗 [Commutative]
Motivation
“beak” detector
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
“middle beak”
detector
A convolutional layer
A CNN is a neural network with some convolutional layers (and some other
layers). A convolutional layer has a number of filters that does convolutional
operation.
Beak detector
A filter
Convolution
These are the network
parameters to be learned.
1 -1 -1
1 0 0 0 0 1 -1 1 -1 Filter 1
0 1 0 0 1 0 -1 -1 1
0 0 1 1 0 0
1 0 0 0 1 0 -1 1 -1
0 1 0 0 1 0 -1 1 -1 Filter 2
0 0 1 0 1 0 -1 1 -1
…
…
6 x 6 image
Each filter detects a
small pattern (3 x 3).
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1 Dot
product
0 1 0 0 1 0 3 -1
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolutional kernel
Padding on the
input volume with
zeros in such
way that the conv
layer does not
alter the spatial
dimensions of
the input
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
If stride=2
1 0 0 0 0 1
0 1 0 0 1 0 3 -3
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
Convolution 1 -1 -1
-1 1 -1 Filter 1
-1 -1 1
stride=1
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
0 1 0 0 1 0
0 0 1 0 1 0 -3 -3 0 1
6 x 6 image 3 -2 -2 -1
Convolution -1 1 -1
-1 1 -1 Filter 2
-1 1 -1
stride=1
Repeat this for each filter
1 0 0 0 0 1
0 1 0 0 1 0 3 -1 -3 -1
-1 -1 -1 -1
0 0 1 1 0 0
1 0 0 0 1 0 -3 1 0 -3
-1 -1 -2 1
0 1 0 0 1 0 Feature
0 0 1 0 1 0 -3 -3 Map
0 1
-1 -1 -2 1
6 x 6 image 3 -2 -2 -1
-1 0 -4 3
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Convolution v.s. Fully Connected
1 0 0 0 0 1 1 -1 -1 -1 1 -1
0 1 0 0 1 0 -1 1 -1 -1 1 -1
0 0 1 1 0 0 -1 -1 1 -1 1 -1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
convolution
image
x1
1 0 0 0 0 1
0 1 0 0 1 0 x2
Fully- 0 0 1 1 0 0
1 0 0 0 1 0
connected
…
…
…
…
0 1 0 0 1 0
0 0 1 0 1 0
x36
1 -1 -1 Filter 1 1 1
-1 1 -1 2 0
-1 -1 1 3 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 0
0 0 1 1 0 0 8 1
1 0 0 0 1 0 9 0
0 1 0 0 1 0 10: 0
…
0 0 1 0 1 0
13 0
6 x 6 image
14 0
fewer parameters! 15 1 Only connect to
16 1 9 inputs, not
fully connected
…
1 -1 -1 1: 1
-1 1 -1 Filter 1 2: 0
-1 -1 1 3: 0
4: 0 3
…
1 0 0 0 0 1
0 1 0 0 1 0 7: 0
0 0 1 1 0 0 8: 1
1 0 0 0 1 0 9: 0 -1
0 1 0 0 1 0 10: 0 Parameter sharing
…
refers to using the
0 0 1 0 1 0 same parameter
13: 0 for more than one
6 x 6 image function in a
14: 0 model.
Fewer parameters 15: 1
16: 1 Shared weights
Even fewer parameters
…
Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1
3 -1 -3 -1 -1 -1 -1 -1
3 0
-3 1 0 -3 -1 -1 -2 1
3 1 -3 -3 0 1 -1 -1 -2 1
3 -2 -2 -1 -1 0 -4 3
The whole CNN
cat dog ……
Convolution
Max Pooling
Can
Fully Connected repeat
Feedforward network
Convolution many
times
Max Pooling
Flattened
Why Pooling
Subsampling pixels will not change the object
bird
bird
Subsampling
New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
3 0
0 0 1 1 0 0 -1 1
1 0 0 0 1 0
0 1 0 0 1 0 Max 3 1
0 3
0 0 1 0 1 0 Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
Pooling layer
Pooling
The whole CNN
3 0
-1 1 Convolution
3 1
0 3
Max Pooling
Can
A new image
repeat
Convolution many
Smaller than the original
times
image
The number of channels Max Pooling
cat dog ……
Convolution
Max Pooling
Max Pooling
1
3 0
-1 1 3
3 1 -1
0 3 Flattened
1 Fully Connected
Feedforward network
3
Only modified the network structure and
CNN in Keras input format (vector -> 3-D tensor)
input
Convolution
1 -1 -1
-1 1 -1
-1 1 -1
-1 1 -1 … There are
25 3x3
-1 -1 1
-1 1 -1 … Max Pooling
filters.
Input_shape = ( 28 , 28 , 1)
3 -1 3 Max Pooling
-3 1
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Convolution
How many parameters for
each filter? 9 25 x 26 x 26
Max Pooling
25 x 13 x 13
Convolution
How many parameters 225=
for each filter? 50 x 11 x 11
25x9
Max Pooling
50 x 5 x 5
Only modified the network structure and
CNN in Keras input format (vector -> 3-D array)
Input
1 x 28 x 28
Output Convolution
25 x 26 x 26
Fully connected Max Pooling
feedforward network
25 x 13 x 13
Convolution
50 x 11 x 11
Max Pooling
1250 50 x 5 x 5
Flattened
Convolution and Pooling as an Infinitely Strong
Prior
Priors can be considered weak or strong depending on how concentrated the
probability density in the prior is.
A weak prior is a prior distribution with high entropy, such as a Gaussian distribution
with high variance. Such a prior allows the data to move the parameters more or less
freely.
A strong prior has very low entropy, such as a Gaussian distribution with low
variance. Such a prior plays a more active role in determining where the parameters
end up.
With the way the weights are being trained, one can think of the use of convolution
as introducing an infinitely strong prior probability distribution over the parameters
of a layer.
Variants of the Basic Convolution Function
multi-channel convolution
Strided Convolution :
Padding:
Zero padding (Valid convolution)
Without Zero padding (Same)
Note
In order to transform from the inputs to the outputs in a convolutional layer. We
generally also add some bias term to each output before applying the nonlinearity.
For locally connected layers it is natural to give each unit its own bias, and for tiled
convolution, it is natural to share the biases with the same tiling pattern as the
kernels.
For convolutional layers, it is typical to have one bias per channel of the output and
share it across all locations within each convolution map.
However, if the input is of known, fixed size, it is also possible to learn a separate
bias at each location of the output map.
Separating the biases may slightly reduce the statistical efficiency of the model, but
also allows the model to correct for differences in the image statistics at different
locations.
Convolutional neural network for Image recognition
Dense neural network and Convolutional neural network
A simple CNN structure
https://poloclub.github.io/cnn-explainer/
Video Tutorial
MNIST dataset
http://yann.lecun.com/exdb/lenet/multiples.html
Case studies
GoogLeNet. The ILSVRC 2014 winner was a Convolutional
Network from Szegedy et al. from Google. Its main
contribution was the development of an Inception Module that
dramatically reduced the number of parameters in the
network (4M, compared to AlexNet with 60M). Additionally,
this paper uses Average Pooling instead of Fully Connected
layers at the top of the ConvNet, eliminating a large amount
of parameters that do not seem to matter much. There are
also several followup versions to the GoogLeNet, most
recently Inception-v4.
Case studies
VGGNet. The runner-up in ILSVRC 2014 was the network from Karen
Simonyan and Andrew Zisserman that became known as the VGGNet.
Its main contribution was in showing that the depth of the network is a
critical component for good performance. Their final best network
contains 16 CONV/FC layers and, appealingly, features an extremely
homogeneous architecture that only performs 3x3 convolutions and
2x2 pooling from the beginning to the end. Their pretrained model is
available for plug and play use in Caffe. A downside of the VGGNet is
that it is more expensive to evaluate and uses a lot more memory and
parameters (140M). Most of these parameters are in the first fully
connected layer, and it was since found that these FC layers can be
removed with no performance downgrade, significantly reducing the
number of necessary parameters.
Case studies
https://www.kaggle.com/syamkakarla/traffic-sign-classification-using-
resnet
Thank You