UNIT 3 ComputerVision
UNIT 3 ComputerVision
1
Introduction to Convolution Neural Networks
2
Introduction to Convolution Neural Networks
• A Convolutional Neural Network (CNN) is a type of deep neural networks
primarily used for processing structured grid data such as images, primarily
used for image processing and computer vision tasks.
• Traditional, Artificial Neural Network(ANN) connects all of the units in one layer
to all the units in a preceding layer(fully/dense connected) leading to high
computation requirement.
• CNNs organizes each layer/feature of input-image into feature maps(which
introduces sparsely/lightly connected network in CNN, explained later). Feature
maps can be thought of as parallel planes or channels(different planes represent
different features of image).
• Convolutional neural networks(CNNs) solves it by utilizing self trainable multi-
layer convolutions using filters(filters are feature detectors which are auto
detected).
3
Introduction to Convolution Neural Networks
Architecture of CNN:
1. Input Layer: Takes raw image data as input.
2. Convolutional Layer: Applies filters to extract features such as edges, textures, and
patterns.
3. Activation Function (ReLU): Introduces non-linearity into the model to enhance
learning capacity.
4. Pooling Layer: Reduces spatial dimensions to decrease computation and prevent
overfitting.
5. Fully Connected Layer: Connects neurons from previous layers to determine final
predictions or classification.
6. Model Training: The network is trained using backpropagation and optimization
techniques such as gradient descent.
7. Output Layer: Produces final classifications or regressions.
4
Introduction to Convolution Neural Networks
5
This operation is actually a correlation (not a convolution), but the term convolution is used for
simplicity.
6
Applications of CNN:
1. Image classification (e.g., facial recognition, object detection)
2. Medical diagnosis (e.g., tumour detection in MRI scans).
3. Autonomous driving (e.g., obstacle detection)
4. Natural language processing (e.g., text and speech recognition)
7
Advantages of CNN
• The beauty of convolutional neural network that it will automatically detect
these filters on its own and that is part of the training
so when the neural network is training or when the CNN is training because .
you're supplying thousands of training images here using that it will use back
propagation and it will figure out the right amount of filters it will figure out the
values in this filter and that is part of the learning or the back propagation. As a
hyper parameter you will specify how many filters you want to have and what is the
size of each of the filters
High accuracy in image classification and pattern recognition tasks.
• Efficient at recognizing spatial hierarchies in images.
8
Benefits of Convolution, ReLu and Pooling
Sparse Connections: It means as shown in above figure we are not connecting all neurons
from one layer to other layer neurons as in ANN. So, it reduces number of connections and
so reducing number of computations. Here Input-matrix(X) is image and whereas, weight(W)
matrix acts as filter.
Parameter Sharing: It means as shown in above figure we are using only 4 weights
repeatedly instead of taking all different weights(which we do in our artificial neural network
generally). So, we say it as parameter (weights) sharing. Which reduces the complexity of
9
CNN.
Benefits of Convolution, ReLu and Pooling
10
Benefits of Convolution, ReLu and Pooling
11
Examples to illustrate CNN:
Example1: Digit Recognition:
let's say you want the computer to
recognize the handwritten digit. First we will
implement it using Artificial Neural
network(ANN) and then go for CNN to over
come disadvantages in ANN.
12
Examples to illustrate CNN:
• The issue with this
representation is that this is
too much hardcoded.
13
Examples to illustrate CNN:
If you have a little shift in
digit 9 as shown beside.
14
Examples to illustrate CNN:
15
Examples to illustrate CNN:
16
Examples to illustrate CNN:
• we created a one-
dimensional array by
flattening the two-
dimensional representation
of our hand return digit
number and
• Then we build a neural
network with one hidden
layer and output layer.
17
Examples to illustrate CNN:
• when you have a bigger image. Ex: little cute looking animal koala with image size is
1920 by 1080. We have 3 as RGB channel
19
Examples to illustrate CNN:
How Does Humans Recognize Images so easily?
• when we look at koala's image, we look at the little features like this round eyes this
black prominent flat nose, fluffy ears, we detect these features one by one in our
brain, connecting together and finds koala.
20
Examples to illustrate CNN:
Example2: Animal Koala-
Image Recognition
21
Examples to illustrate CNN:
22
Examples to illustrate CNN:
23
Examples to illustrate CNN:
24
Examples to illustrate CNN:
Here loopy circle pattern or a head
filter(shown in green color beside)
convolves with different 3x3 grid
combinations from your original
image and multiply individual
numbers with this filter.
Image
25
Examples to illustrate CNN: Weighted sum : Multiply all weights in filter with pixel
In CNNs, the weighted sums are only values in image then take sum and average as shown below:
performed within a small local window
as shown here:
Filter/Kernel: It
Contains weights
Feature Map
Image
There are total nine numbers and whatever number average you get you put it in grid
called as feature map 26
Examples to illustrate CNN:
Filter or Kernel
Feature Map
Image
Let us take a stride of 1,
Stride: It is a step size by which a filter (kernel) moves across an image or feature
map. A higher stride results in lower resolution.
wherever you see number one or a number that is close to one in feature map, it
means you have a loopy circle pattern matched. Similarly the koalas eyes etc also
27
detected using the specific filters to detect koalas eyes etc.
Examples to illustrate CNN:
Filter or Kernel
Image
28
Examples to illustrate CNN:
29
Examples to illustrate CNN:
30
Examples to illustrate CNN:
31
Examples to illustrate CNN:
Example2: Let us see different filters to detect koala
If the eyes are at a different location it will still detect because you're moving the filter
throughout the image and they are location invariant, which means doesn't matter where the eyes
are in the image these filters will detect those eyes
32
Examples to illustrate CNN:
33
Examples to illustrate CNN:
34
Examples to illustrate CNN:
so we will flatten final feature map in 2D array form into 1D array by flattening it as shown
above and give it to Artificial Neural Network(ANN) for final classification.
35
Examples to illustrate CNN:
36
POOLING and UNPOOLING
37
ReLu and Pooling
There are two other components, They are:
1. “ReLu” activation to introduce nonlinearity used to speed up output
computation and
2. “Pooling” concept to reduce size of final feature map to reduce
computation in final neural network for classification(explained in next
slides).
38
ReLu
1. “ReLu” activation:
1 if weight > 0
0 if weight < = 0
39
Pooling( or Downsampling)
We didn’t address the
issue of too much
computation?
So to avoid this we
introduce Pooling in
CNNs.
40
Pooling and Unpooling
2. Pooling(also called Down Sampling): Pooling is used to reduce the size
of feature map
We have mainly two types of pooling, They are:
1. Max Pooling(Generally used)
2. Average Pooling
41
Pooling
Max Pooling: so here you take a window of 2x2 and you pick the maximum number from each
window of 2x2. We have reduced the size of feature map from “4 by 4” to “2 by 2” which
reduces the computations when we give this feature map to neural network by flattening.
stride = 2, means once we are done with one window we move by two points
42
Pooling
43
Pooling
Max Pooling
44
Pooling
Max Pooling
45
Pooling
Max Pooling
46
Pooling
47
Pooling
48
Pooling
49
Pooling
50
Unpooling(or Upsampling)
• sometimes we need to reverse pooling process to reconstruct a higher-
resolution image. This is where Unpooling and transposed convolution come
into play.
• Unpooling is the reverse process, where we restore the original size of the
image from the pooled representation.
In transposed convolution:
1.The input image is expanded by inserting rows and columns of zeros between existing
pixels.
2.A convolutional filter is then applied to this expanded image, filling in the gaps with
learned values.
3.The result is a larger, more detailed image.
52
Unpooling
Figure: Transposed convolution can be used to upsample (increase the size of) an
image. Before applying the convolution operator, (s − 1) extra rows and columns of
zeros are inserted between the input samples, where s is the upsampling stride.
53
Sample Architecture of CNN
54 ANN
Using
Important Parameters of CNN
Padding
Stride ( explained already)
Dilation
1. Grouping
55
Important Parameters of CNN
1. Padding:
57
Important Parameters of CNN
3. Dilation:
Introduces gaps between pixels during convolution, allowing for a larger
receptive field with fewer parameters.
Special cases:
Depth wise Convolution: Each input channel is convolved
independently.
59
Special Convolution Techniques
These techniques are designed to improve performance of CNNs like
efficiency and computation power. Techniques are:
1) 1 X 1 Convolution
2) Partial Convolution
3) Gated Convolution
60
Special Convolution Techniques:
1X1 Convolution
A special case where filters are 1×1 in
size (single pixel).
61
Special Convolution Techniques:
Partial Convolution
Partial convolutions are a
technique designed for tasks like image
inpainting, where the goal is to fill in
missing or damaged parts of an
image (e.g., removing a scratch or
reconstructing a hole).
Works by masking out missing pixels
and normalizing based on available
pixels.
It’s great for tasks like restoring old Convolution Deconvolution
Unpooling or
photos or removing objects from
images.
62
Special Convolution Techniques:
Gated Convolution
Gated convolutions are an advanced technique where the network dynamically
decides how important each pixel is for the task at hand.
It’s like giving the network a “filtering knob” to control what it pays attention to.
Gated convolution splits the process into two parts:
1. Feature Extraction: A regular convolution extracts features from the input.
2. Gating Mechanism: A second convolution (often followed by a sigmoid
activation) generates a “gate” value between 0 and 1 for each pixel. This gate
value decides how much of the feature to keep.
63
APPLICATION: DIGIT CLASSIFICATION
64
APPLICATION: DIGIT CLASSIFICATION
• One of the most common applications of Convolutional Neural Networks
(CNNs) is digit classification, where a model is trained to recognize handwritten
digits from images.
• This is useful for tasks like recognizing numbers on bank checks, postal codes, and
handwritten forms.
65
APPLICATION: DIGIT CLASSIFICATION
How CNNs Help in Digit Classification
CNNs are a type of deep learning model designed to process visual data. They automatically
learn patterns from images using multiple layers, such as:
• Convolutional Layers: These detect edges, curves, and shapes in the image.
• Activation Functions: These introduce non-linearity to help the network learn complex
patterns. Common activation functions include ReLU (Rectified Linear Unit), which helps
avoid issues like vanishing gradients.
• Pooling Layers: These reduce the size of the feature maps while retaining important
information. Techniques like max pooling (taking the highest value from a region) or
average pooling (taking the mean) help improve efficiency.
• Fully Connected Layers: These process the extracted features and make predictions, such
as determining which digit is present in the image.
• SoftMax Output Layer: This converts the final outputs into probabilities, indicating the
likelihood of the image belonging to each digit category (0-9).
66
APPLICATION: DIGIT CLASSIFICATION
Datasets for Training and Testing
To train a CNN for digit recognition, datasets containing labeled images of handwritten
digits are used. Some common datasets include:
•MNIST: A collection of 60,000 training images and 10,000 test images of handwritten
digits (0-9). This is a basic dataset for learning CNNs.
•CIFAR-10: A more challenging dataset containing 60,000 small images across 10 object
categories (including vehicles and animals), used for training CNNs beyond digit
classification.
•Fashion MNIST: A dataset with images of clothing items instead of digits, often used as
a tougher alternative to MNIST.
67
APPLICATION: DIGIT CLASSIFICATION
Real-World Applications of Digit classification:
CNN-based digit recognition is widely used in practical applications, such as:
Today, CNNs are a fundamental part of computer vision, and learning to build a
simple digit classifier is a common first step for students and professionals entering
the field of deep learning.
68
APPLICATION: DIGIT CLASSIFICATION
Figure: Architecture of LeNet-5, a convolutional neural network for digit recognition. This
network uses multiple channels in each layer and alternates multi-channel convolutions with
Downsampling operations, followed by some fully connected layers that produce one
activation for each of the 10 digits being classified.
69
NETWORK ACHITECTURES
70
Sample Architecture of CNN
71 ANN
Using
CNN LENET-5 ARCHITECTURE FOR DIGIT CLASSIFICATION
Figure: Architecture of LeNet-5, a convolutional neural network for digit recognition. This
network uses multiple channels in each layer and alternates multi-channel convolutions with
Downsampling operations, followed by some fully connected layers that produce one
activation for each of the 10 digits being classified.
72
CNN AlexNet ARCHITECTURE FOR DIGIT CLASSIFICATION
AlexNet is a deep Convolutional Neural Network (CNN) that significantly
improved image classification tasks. It won the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) in 2012 by achieving a top-5 error rate of
15.3%, far outperforming previous models.
Figure: Architecture of the Supervision deep neural network (more commonly known
as “AlexNet”). The network consists of multiple convolutional layers with ReLU
activations, max pooling, some fully connected layers, and a SoftMax to produce the
73
final class probabilities.
Model Zoos
Explanation of model zoos, transfer learning,
and deep learning frameworks.
74
What are Model Zoos?
• A model zoo is a collection of pre-trained deep learning models used for
applications like image classification, object detection, and image
segmentation.
• These pre-trained models are typically trained on large datasets, such as
ImageNet.
• These can be fine-tuned for specific tasks without needing to train from
scratch.
• Popular model zoos include:
1. Torch Vision (a library of PyTorch)
2. TensorFlow Hub(Tensor Flow): For mobile-friendly models
3. TensorFlow Lite Model Zoo (Tensor Flow): For mobile-friendly
models
75
TorchVision (a library in PyTorch) is a model zoo that provides popular deep
learning architectures like:
79
Neural Architecture Search (NAS)
• Neural Architecture Search (NAS) is an automated approach to design
deep learning models.
• Instead of manually designing architectures through trial and error, NAS
algorithms explore different network structures to find the best-
performing model.
• Popular NAS-generated models:
1. EfficientNet
2. FBNet
3. RegNet
4. RandomNets.
• These models often achieve better accuracy while using fewer
parameters and computations than traditional architectures.
80
Deep Learning Software and Frameworks
Various software frameworks help build and train deep learning models.
Some of the most widely used are:
• Common deep learning frameworks:
1. PyTorch: Flexible for research and production.
2. TensorFlow: Powerful for deployment.
3. Keras: Simplifies deep learning development.
4. MXNet: Used in academia and industry.
• For visualization and debugging, tools like TensorBoard and Visdom
help monitor model training and performance.
81
Conclusion
82
Visualizing Weights and Activations
in Neural Networks
Understanding how neural networks process and interpret data
using visualization techniques.
83
Visualizing Weights and Activations in
Neural Networks
Introduction
• When working with computer vision and deep learning, understanding how
a neural network processes and interprets data is crucial.
• One effective way to do this is through visualization, which helps in
debugging, refining models, and developing an intuition for how the
network operates.
84
Visualizing Network Weights
• Each connection between neurons has an associated weight, which
determines the strength of influence one neuron has on another.
• Visualizing these weights and activations (responses of neurons) can help
us understand how the network makes decisions.
85
Displaying Neuron Activations
• Activations are used to show neuron responses to input.
86
Understanding Neuron Responses
• First layer neurons detect simple patterns like edges or textures. These
can be visualized by directly examining their weights.
• Deeper layers neurons detect complex features, such as shapes or objects
using deconvolution and guided backpropagation.
To understand their behavior, we can:
• Identify the input patches (regions of an image) that activate them the
most.
• Use deconvolution networks to trace activations back to the original
image, revealing what the neuron is focusing on.
• Apply guided backpropagation, which enhances contrast in these
visualizations for clearer insights.
87
Activation Mapping Techniques
Several advanced techniques help in identifying which parts of an image
contribute most to a network’s decision:
88
Interactive Tools for Neural Network
Visualization
To make neural network interpretation more accessible, several tools have
been developed:
• OpenAI's Microscope visualizes neuron behavior.
• Visualization of Pre-trained networks: Models like GoogLeNet help
analyze layer significance in processing images.
89
Conclusion
Visualization techniques help:
1. Refine models,
2. Diagnose issues, and
3. Improve neural network performance.
90
Adversarial Examples
Understanding Adversarial Examples
• Adversarial examples are specially crafted inputs designed to deceive
deep learning models into making incorrect predictions.
• These examples look normal to humans but have been subtly altered in a
way that confuses AI systems.
• The modifications are often unnoticeable to the human eye but
significantly impact the model's interpretation.
How Are Adversarial Examples
Created?
• To create an adversarial example, a
small amount of calculated
noise(perturbation) is added to an
input (such as an image).
• This noise is determined by
backpropagation, which is
typically used for training neural
networks.
• However, in this case, it is used to
adjust the input itself rather
than the model's internal weights.
Steps to Create Adversarial Examples
1. Gradient Calculation: The model’s response to an input is analyzed, and
its prediction scores (activations) for different categories are examined.
2. Black Box Attack: The attacker does not have direct access to the
model's internal details and instead relies on trial and error or general
knowledge of how similar models behave.
Why Are Adversarial Examples a
Concern?
Adversarial examples highlight vulnerabilities in AI systems, if we
don’t train the models with these possible/manual adversarial examples,
especially in:
4. Colorization
• The model is trained to add realistic colors to black-and-white images.
• Helps it learn object features and textures.
117