CNN MLFA Ons-Part1
CNN MLFA Ons-Part1
Why CNNs?
Topics
General and biological motivation
Simple cells:
1. Activity characterized by a linear function of the image.
2. Operates in a spatially localized (SL) receptive field.
3. Each set responds to edges of different orientation.
Complex cells:
1. Operates in large SL receptive field
2. Receive input from lower level simple cells.
Hyper-complex cells:
1. Larger receptive field
2. Receive input from lower level complex cells.
Biological motivation - Grandmother cell
The grandmother cell is a hypothetical neuron that represents a complex but specific
concept or object proposed by cognitive scientist Jerry Letvin in 1969.
Biological motivation - CNN.
Back-propagation [Lang and Hinton, 1988], and modern CNN [LeCun et al., 1989]
Layer 3
Layer 1 Layer 5 Input
CNN for document recognition [LeCun et al., 1989].
Clustering or
Shallow neural n/w,
etc. Detect
”
Speaker id, speech translate
”
Machine translation
Features: Classical
Filter banks
Edges and Corners: Sobel, LoG and Canny
Labels
Face p = 0.94
Generalizable
1. The same neural net approach can be used for many different applications
and data types (e.g., hand-crafted face features cannot be used for pedestrian
detection, whereas the same CNN architecture can be used for both).
Scalable
2. Parameter sharing
a. Elementary feature detectors useful on one
part may be useful in other parts of images as
well.
CNN: Local connectivity (LC)
Hidden layer (3 nodes)
In general for a level with m input and n output nodes and CNN-local connectivity of k nodes (k<m):
Channel 2
Single input channel
Two input channels
CNN with multiple output maps
Local connectivity
Without padding and stride With padding [1,1] & stride [2,2]
CONVOLUTIONAL LAYER
1. Accepts a volume of size W1 X H1 X D1.
2. Requires four hyperparameters:
a. Number of filters K
b. their spatial extent F
c. their stride S
d. the amount of zero padding P
3. Produces an output volume of size W2 X H2 X D2 where:
W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K
4. With parameter sharing, it introduces F⋅F⋅D1 weights per filter, for a total of
(F⋅F⋅D1)⋅K weights and K biases.
Hyper parameters for convolutional layer.
Dilation
1. To reduce the spatial size of the representation to reduce the amount of parameters
and computation in the network.
2. Average pooling or L2 pooling can also be used, but not popular like max pooling.
POOLING LAYER
1. Accepts a volume of size W1 X H1 X D1.
2. Requires two hyperparameters:
a. their spatial extent F
b. their stride S
c. the amount of zero padding P (commonly P = 0).
3. Produces an output volume of size W2 X H2 X D2 where:
W2=(W1−F+2P)/S+1, H2=(H1−F+2P)/S+1, D2=K
4. Introduces zero parameters since it computes a fixed function of the input.
Different layers of CNN architecture
Recap: Gradient descent
Recap: Backpropagation
Recap: Backpropagation
Activation functions: Sigmoidal function
Drawback 1: Sigmoids saturate and kill gradients (when the neuron’s activation saturates at
either tail of 0 or 1).
0 ⇒ fails to update weights while back-prop.
Activation functions: Rectified Linear Unit (very popular).
6X improvement in
convergence
tanh
ReLU
FC layer
1. Multilayer perceptron.
2. Generally used in final layers to classify the object.
3. Role of a classifier.
Softmax
Fully connected
Softmax layer
1. Normalize output as discrete class probabilities.
Cross-entropy Loss
What we want?
Cross-entropy Loss?
Cross-entropy Loss?
A Real Life Application
Different layers of CNN architecture: A Review
Training very deep network: Resnet
Training very deep network: Resnet
Training very deep network: Resnet
Training very deep network: Resnet