Images and Convolutional Neural Networks: Practical Deep Learning
Images and Convolutional Neural Networks: Practical Deep Learning
neural networks
1
Computer vision
2
From picture to pixels
5
Convolutional neural networks
Convolutional neural network
(CNN, ConvNet)
● Dense or fully-connected: each neuron connected to all
neurons in previous layer
● CNN: only connected to a small “local” set of neurons
● Radically reduces number
Dense layer Convolutional layer
of network connections
7
Convolution for image data
3✕3 weights
3✕3 image area (conv. kernel)
output
● Image represented as 2D grid of values neuron
8
Image source: https://mlnotebook.github.io/post/CNN1/
Convolution for image data
image input 3✕3 weights
(conv. kernel)
● We repeat for each output neuron
● Weights stay the same (shared
weights)
● Border effect: without padding
output area is smaller
● Outputs form a “feature map”
feature map
9
Image source: https://mlnotebook.github.io/post/CNN1/
A real example
11
Convolution for image data K feature maps each
252✕252✕1
K kernels
● We can repeat for different sets each 5✕5(✕3)
of weights (kernels)
● Each learns a different “feature”
● Typically: edges, corners, etc image
256✕256✕3
● Each outputs a feature map
...
...
12
Convolution for image data
output tensor
252✕252✕K
K kernels
● We stack the feature maps into a each 5✕5(✕3)
single tensor
● Depth out output tensor =
number of kernels K
image
● Tensor is the output of the 256✕256✕3
entire convolutional layer
...
13
Convolution in layers: intuition
● We can then add another
convolutional layer
● This operates on the
previous layer’s output
tensor (feature maps)
“cat”
● Features layered from
simple to more complex
14
learned learned learned
learned
low-level mid-level high-level cat
classifier
features features features
Image from lecture by Yann Le Cun, original from Zeiler & Fergus (2013)
15
Image datasets
16
Convolutional layers
17
Pooling layers
19
Non-Linearity Layer
20
Activation: Sigmoid
• Sigmoid function σ: takes a real-valued number and “squashes” it into the range between 0 and 1
§ The output can be interpreted as the firing rate of a biological neuron
o Not firing = 0; Fully firing = 1
§ When the neuron’s activation are 0 or 1, sigmoid neurons saturate
o Gradients at these regions are almost zero (almost no signal will flow)
§ Sigmoid activations are less common in modern NNs
𝑓 𝑥 ℝ! → 0,1
• Tanh function: takes a real-valued number and “squashes” it into range between -1 and 1
§ Like sigmoid, tanh neurons saturate
§ Unlike sigmoid, the output is zero-centered
o It is therefore preferred than sigmoid
§ Tanh is a scaled sigmoid: tanh(𝑥) = 2 , 𝜎(2𝑥) − 1
𝑓 𝑥 ℝ! → −1,1
• ReLU (Rectified Linear Unit): takes a real-valued number and thresholds it at zero
𝑓 𝑥 = max(0, 𝑥) ℝ! → ℝ!"
23
Activation: Leaky ReLU
24
Activation: Linear Function
• Linear function means that the output signal is proportional to the input signal to the neuron
ℝ! → ℝ!
§ If the value of the constant c is 1, it is
also called identity activation function
𝑓 𝑥 = 𝑐𝑥
§ This activation type is used in
regression problems
o E.g., the last layer can have linear
activation function, in order to output a
real number (and not a class
membership)
25
Fully Connected Layer
26
Fully Connected Layer
Key Features:
• In CNNs, FC layers often come after the convolutional and pooling
layers. They are used to flatten the 2D spatial structure of the
data into a 1D vector and process this data for tasks like
classification.
• The number of neurons in the final FC layer usually matches the
number of output classes in a classification problem. For instance,
for a 10-class digit classification problem, there would be 10
neurons in the final FC layer, each outputting a score for one of
the classes.
27
Typical architecture
29
AlexNet
VGG
30
Inception /
GoogLeNet
ResNet
DenseNet
31
Large-scale CNNs with pre-trained weights
retrain
replace
output layer
extracted
features
34