0% found this document useful (0 votes)
10 views65 pages

CNN MLFA Ons-Part1

Uploaded by

Nabayan Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views65 pages

CNN MLFA Ons-Part1

Uploaded by

Nabayan Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Convolutional Neural Network

Why CNNs?
Topics
General and biological motivation

Hand-coded to learnt filters

Understanding Convolution Operation

CNNs over Feed Forward Neural Networks

Different layers in a CNN (convolution, pooling, relu, etc.)

CNNs for Regression

CNN for Classification meets CNN for Regression


Biological motivation - Mammalian vision system.

Hubel and Wiesel (1959) Experimental setup


1981 Nobel prize

Suggested a ‘hierarchy’ of feature detectors in the mammalian visual cortex.


Biological motivation - Mammalian vision system.

Simple cells:
1. Activity characterized by a linear function of the image.
2. Operates in a spatially localized (SL) receptive field.
3. Each set responds to edges of different orientation.

Complex cells:
1. Operates in large SL receptive field
2. Receive input from lower level simple cells.

Hyper-complex cells:
1. Larger receptive field
2. Receive input from lower level complex cells.
Biological motivation - Grandmother cell
The grandmother cell is a hypothetical neuron that represents a complex but specific
concept or object proposed by cognitive scientist Jerry Letvin in 1969.
Biological motivation - CNN.
Back-propagation [Lang and Hinton, 1988], and modern CNN [LeCun et al., 1989]

CNN proposed by LeCun et al. for document recognition.


CNN for document recognition [LeCun et al., 1989].

All images are 28x28 grayscale.

60k training examples.

10k test examples

Output value is integer from 0-9

Layer 3
Layer 1 Layer 5 Input
CNN for document recognition [LeCun et al., 1989].

Translation invariance Rotation invariance Scale invariance

Squeeze invariance Stroke-width invariance Noise invariance


Then why DL didn’t take-off in 90’s?
Backpropagation applied to handwritten zip code recognition
Y LeCun, B Boser, JS Denker, D Henderson, RE Howard, W
Hubbard, …
Neural computation 1 (4), 541-551, 1989

Handwritten digit recognition with a back-propagation network


Y LeCun, B Boser, JS Denker, D Henderson, RE Howard, W
Hubbard, ...
Advances in neural information processing systems 2, NIPS 1989,
396-404
Then why DL didn’t take-off in 90’s?

1. Limited big data availability


2. Limited computational power to crunch data
Why DL is trending now?
Big data availability Computational power to crunch data

One trillion images.

350 million images uploaded per day.

100 hrs of video uploaded per minute.

2.5 Petabytes data every minute.

Parallel processing units - GPUs


When/how was deep-learning reclaimed?
Traditional ML
Topics
General and biological motivation

Hand-coded to learnt filters

Understanding Convolution Operation

CNNs over Feed Forward Neural Networks

Different layers in a CNN (convolution, pooling, relu, etc.)

CNNs for Regression

CNN for Classification meets CNN for Regression


Traditional machine learning
Raw data Feature extraction Classifier/detector Result

Clustering or
Shallow neural n/w,
etc. Detect


Speaker id, speech translate


Machine translation
Features: Classical

Filter banks
Edges and Corners: Sobel, LoG and Canny

PCA/Subspaces Histogram of responses


Different transforms
(Fourier/Wavelet)
Deep learning Backpropagate errors to
optimize weights
Training phase

Labels

Labelled dataset (usually in millions)


Network (with unoptimized weights)
Deep learning
Deployment

Network (with trained weights)

Pedestrian detection (for automatic braking)


Traditional ML vs Deep learning: Face detection
Traditional machine learning
Traditional ML vs Deep learning: Face detection
Deep learning

Face p = 0.94

Input Low-level features Mid-level features High-level features Output node


Deep learning benefits over traditional ML
Robust
1. No need to design the features ahead of time – features are automatically
learned to be optimal for the task at hand.
2. Robustness to natural variations in the data is automatically learned.

Generalizable
1. The same neural net approach can be used for many different applications
and data types (e.g., hand-crafted face features cannot be used for pedestrian
detection, whereas the same CNN architecture can be used for both).

Scalable

1. Performance improves with more data, and can be leveraged by massive


parallelization of GPUs.
Topics
General and biological motivation

Hand-coded to learnt filters

Understanding Convolution Operation

CNNs over Feed Forward Neural Networks

Different layers in a CNN (convolution, pooling, relu, etc.)

CNNs for Regression

CNN for Classification meets CNN for Regression


What is a Convolution operation?
Image representation
Convolution operation detecting edges
Convolution operations: Examples
Topics
General and biological motivation

Hand-coded to learnt filters

Understanding Convolution Operation

CNNs over Feed Forward Neural Networks

Different layers in a CNN (convolution, pooling, relu, etc.)

CNNs for Regression

CNNs for Classification meets CNNs for Regression


CNNs over Feed Forward Neural Network

Multi-layer neural network

CNNs are multi-layer neural network with two constraints:


1. Local connectivity
2. Parameter sharing
Intuition behind CNN (over MLP)
CNNs are multi-layer neural network with two
constraints:
1. Local connectivity:
a. Can extract elementary features such as
edges, end-points, corners.

b. This features are combined by subsequent


layer to detect higher order features.

2. Parameter sharing
a. Elementary feature detectors useful on one
part may be useful in other parts of images as
well.
CNN: Local connectivity (LC)
Hidden layer (3 nodes)

Input layer (7 nodes)

MLNN ( 7 X 3 = 21 parameters) MLNN-LC ( 3 X 3 = 9 parameters)


2.3X runtime and storage efficient.
In general for a level with m input and n output nodes and CNN-local connectivity of k nodes (k<m):

MLNN have MLNN-LC have:


1. m x n parameters to store. 1. k x n parameters to store.
2. O(m x n) runtime 2. O(k x n) runtime
CNN: Parameter sharing (PS)

MLNN (21 parameters) MLNN-LC ( 3 X 3 = 9 parameters) MLNN-LC-PS (3 parameters)


2.3X runtime and storage efficient. 2.3X faster,
& 7X storage efficient.

In general for a level with m input and n output nodes and CNN-local connectivity of k nodes (k<m):

MLNN have MLNN-LC have: MLNN-LC-PS have:


1. m x n parameters to store. 1. k x n parameters to store. 1. k parameters to store.
2. O(m x n) runtime 2. O(k x n) runtime 2. O(k x n) runtime
CNN with multiple input channels

Channel 2
Single input channel
Two input channels
CNN with multiple output maps

Single input map Two output maps


A generic level of CNN
Parameter sharing

Local connectivity

# input channels # output maps


Topics
General and biological motivation

Hand-coded to learnt filters

Understanding Convolution Operation

CNNs over Feed Forward Neural Networks

Different layers in a CNN (convolution, pooling, relu, etc.)

CNNs for Regression

CNN for Classification meets CNN for Regression


Different layers of CNN architecture
CNN: Convolutional layer

1. To reduce the number of weights (through local connectivity).


2. To provide spatial invariance (through parameter sharing).
Closer look into CNN filters.
Hyper parameters for convolutional layer.
1. Zero padding (to control input size spatially.)

Without padding (i.e., [0,0]) With padding [2,2]


Hyper parameters for convolutional layer.
2. Stride (to produce smaller output volumes spatially.)

Without stride (i.e., [1,1]) With stride [2,2]


Hyper parameters for convolutional layer.
Both padding and stride

Without padding and stride With padding [1,1] & stride [2,2]
CONVOLUTIONAL LAYER
1. Accepts a volume of size W1 X H1 X D1.
2. Requires four hyperparameters:
a. Number of filters K
b. their spatial extent F
c. their stride S
d. the amount of zero padding P
3. Produces an output volume of size W2 X H2 X D2 where:
W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K
4. With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of
(F⋅F⋅D1)⋅K weights and K biases.
Hyper parameters for convolutional layer.
Dilation

Vanilla Convolution With dilation


CONVOLUTIONAL LAYER
1. Accepts a volume of size W1 X H1 X D1.
2. Requires four hyperparameters:
a. Number of filters K
b. their spatial extent F
c. their stride S
d. the amount of zero padding P
3. Produces an output volume of size W2 X H2 X D2 where:
W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K —- Exercise
4. With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of
(F⋅F⋅D1)⋅K weights and K biases.
Different layers of CNN architecture
CNN: Pooling layer

1. To reduce the spatial size of the representation to reduce the amount of parameters
and computation in the network.
2. Average pooling or L2 pooling can also be used, but not popular like max pooling.
POOLING LAYER
1. Accepts a volume of size W1 X H1 X D1.
2. Requires two hyperparameters:
a. their spatial extent F
b. their stride S
c. the amount of zero padding P (commonly P = 0).
3. Produces an output volume of size W2 X H2 X D2 where:
W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K
4. Introduces zero parameters since it computes a fixed function of the input.
Different layers of CNN architecture
Recap: Gradient descent
Recap: Backpropagation
Recap: Backpropagation
Activation functions: Sigmoidal function

Drawback 1: Sigmoids saturate and kill gradients (when the neuron’s activation saturates at
either tail of 0 or 1).
0 ⇒ fails to update weights while back-prop.
Activation functions: Rectified Linear Unit (very popular).
6X improvement in
convergence

tanh

ReLU

Advantage 1: Eliminate saturation and killing of gradients (one direction).

Sigmoid neurons involve expensive operations (exponentials, etc.), whereas


Advantage 2:
ReLU can be implemented by simply thresholding activations at zero.
Different layers of CNN architecture
Flattening, fully connected (FC) layer and softmax
Class
probabilities Flattening
1. Vectorization (converting M X N X D tensor to a
MND X 1 vector).

FC layer
1. Multilayer perceptron.
2. Generally used in final layers to classify the object.
3. Role of a classifier.
Softmax
Fully connected
Softmax layer
1. Normalize output as discrete class probabilities.
Cross-entropy Loss

What we want?
Cross-entropy Loss?
Cross-entropy Loss?
A Real Life Application
Different layers of CNN architecture: A Review
Training very deep network: Resnet
Training very deep network: Resnet
Training very deep network: Resnet
Training very deep network: Resnet

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy