0% found this document useful (0 votes)

4 views34 pages

Images and Convolutional Neural Networks: Practical Deep Learning

The document discusses computer vision, emphasizing the role of convolutional neural networks (CNNs) in enabling computers to understand visual information. It explains the process of digitizing images into pixels, the architecture of CNNs, and various activation functions used in neural networks. Additionally, it highlights the applications of CNNs in tasks such as object detection, semantic segmentation, and video recognition.

Uploaded by

dhruv tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views34 pages

Images and Convolutional Neural Networks: Practical Deep Learning

Uploaded by

dhruv tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Images and convolutional

neural networks

Practical deep learning

1
Computer vision

Computer vision = giving computers the ability to

understand visual information
Examples:
○ A robot that can move around obstacles by analysing the
input of its camera(s)
○ A computer system finding images of cats among millions
of images on the Internet

2
From picture to pixels

An image has to be digitized for It is turned into millions of “pixel” elements

computer processing

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922

0.49803922 0.49411765 0.4862745 0.47058824 0.49411765

0.5019608 0.49803922 0.49803922 0.49019608 0.50980395

0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

Picture source: https://pixabay.com/en/kitty-cat-kid-cat-domestic-cat-2948404/

Each a set of numbers quantifying the
color of that element
3
From pixels to … understanding?

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922

There’s a cat among some
0.49803922 0.49411765 0.4862745 0.47058824 0.49411765
flowers in the grass
0.5019608 0.49803922 0.49803922 0.49019608 0.50980395

0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

● This is easy for humans

● But for AI it’s actually one of the harder problems!
● How do you transform that grid of numbers into understanding…
or even something useful?
4
Image understanding
• Humans are so good in vision that it’s not even considered intelligence

5
Convolutional neural networks
Convolutional neural network
(CNN, ConvNet)
● Dense or fully-connected: each neuron connected to all
neurons in previous layer
● CNN: only connected to a small “local” set of neurons
● Radically reduces number
Dense layer Convolutional layer
of network connections

7
Convolution for image data
3✕3 weights
3✕3 image area (conv. kernel)
output
● Image represented as 2D grid of values neuron

● Each output neuron connected to

small 2D area in the image
● Output value = weighted sum of inputs
● Idea: nearby pixels are related ⇒
we can learn local relationships of pixels

8
Image source: https://mlnotebook.github.io/post/CNN1/
Convolution for image data
image input 3✕3 weights
(conv. kernel)
● We repeat for each output neuron
● Weights stay the same (shared
weights)
● Border effect: without padding
output area is smaller
● Outputs form a “feature map”

feature map

9
Image source: https://mlnotebook.github.io/post/CNN1/
A real example

Image from: http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

Side note: color images
● Example: 256 ✕ 256 color image with 3 color channels (red, green,
and blue)
⇒ single image is a 3D tensor: 256 ✕ 256 ✕ 3
● Example: 5 ✕ 5 convolution is actually also a 3D tensor: 5 ✕ 5 ✕ 3
● Slides over width and height, but covers the full color depth

11
Convolution for image data K feature maps each
252✕252✕1

K kernels
● We can repeat for different sets each 5✕5(✕3)
of weights (kernels)
● Each learns a different “feature”
● Typically: edges, corners, etc image
256✕256✕3
● Each outputs a feature map

...

...
12
Convolution for image data
output tensor
252✕252✕K

K kernels
● We stack the feature maps into a each 5✕5(✕3)
single tensor
● Depth out output tensor =
number of kernels K
image
● Tensor is the output of the 256✕256✕3
entire convolutional layer

...
13
Convolution in layers: intuition
● We can then add another
convolutional layer
● This operates on the
previous layer’s output
tensor (feature maps)
“cat”
● Features layered from
simple to more complex

14
learned learned learned
learned
low-level mid-level high-level cat
classifier
features features features

Image from lecture by Yann Le Cun, original from Zeiler & Fergus (2013)

15
Image datasets

• Color image mini-batches are 4D tensors:

width ✕ height ✕ color channels ✕
samples
• Plenty of big datasets for training exist, e.g.,
ImageNet with 1,2 million images in 1000
classes
• Data augmentation for small datasets:
generate more training data by transforming
existing data
• E.g., shifting, rotation, cropping,
Scaling, adding noise, etc …

16
Convolutional layers

• Input: tensor of size N × Wi × Hi × Ci

• Hyperparameters:
• K: number of filters
• w, h: kernel size
• padding: how to handle image borders
• activation function
• Output: tensor of size N × Wo × Ho × K
• In tf.keras:
layers.Conv2D(filters, kernel_size,
padding, activation)

(there is also Conv1D and Conv3D)

17
Pooling layers

• Used to reduce the spatial resolution

• independently on each channel
• reduce complexity and number
of parameters
• MAX operator most common
• sometimes also AVERAGE
• In tf.keras:
layers.MaxPooling2D(pool_size)
layers.AveragePooling2D(pool_size)

18 Image from http://cs231n.github.io/convolutional-networks/

• Flatten
• flattens the input into a vector
(typically before dense layers)
• Dropout
• similar as with dense layers
• In tf.keras:
layers.Flatten()
layers.Dropout(rate)

19
Non-Linearity Layer

• Non-linear activations are needed to learn complex (non-linear)

data representations
• Otherwise, NNs would be just a linear function (such as W! W" 𝑥 = 𝑊𝑥)
• NNs with large number of layers (and neurons) can approximate more
complex functions

20
Activation: Sigmoid

• Sigmoid function σ: takes a real-valued number and “squashes” it into the range between 0 and 1
§ The output can be interpreted as the firing rate of a biological neuron
o Not firing = 0; Fully firing = 1
§ When the neuron’s activation are 0 or 1, sigmoid neurons saturate
o Gradients at these regions are almost zero (almost no signal will flow)
§ Sigmoid activations are less common in modern NNs

𝑓 𝑥 ℝ! → 0,1

Slide credit: Ismini Lourentzou – Introduction to Deep Learning 21

Activation: Tanh

• Tanh function: takes a real-valued number and “squashes” it into range between -1 and 1
§ Like sigmoid, tanh neurons saturate
§ Unlike sigmoid, the output is zero-centered
o It is therefore preferred than sigmoid
§ Tanh is a scaled sigmoid: tanh(𝑥) = 2 , 𝜎(2𝑥) − 1

𝑓 𝑥 ℝ! → −1,1

Slide credit: Ismini Lourentzou – Introduction to Deep Learning 22

Activation: ReLU

• ReLU (Rectified Linear Unit): takes a real-valued number and thresholds it at zero
𝑓 𝑥 = max(0, 𝑥) ℝ! → ℝ!"

§ Most modern deep NNs use ReLU

activations
𝑓 𝑥
§ ReLU is fast to compute
o Compared to sigmoid, tanh
o Simply threshold a matrix at zero
§ Accelerates the convergence of gradient
descent
o Due to linear, non-saturating form
§ Prevents the gradient vanishing problem 𝑥

23
Activation: Leaky ReLU

• The problem of ReLU activations: they can “die”

§ ReLU could cause weights to update in a way that the gradients can become zero and the neuron will not
activate again on any data
§ E.g., when a large learning rate is used

• Leaky ReLU activation function is a variant of ReLU

§ Instead of the function being 0 when 𝑥 < 0, a leaky ReLU has a small negative slope (e.g., α = 0.01, or
similar)
§ This resolves the dying ReLU problem
𝑓 𝑥
§ Most current works still use ReLU
𝛼𝑥 for 𝑥 < 0
o With a proper setting of the learning =3
𝑥 for 𝑥 ≫ 0
rate, the problem of dying ReLU can be
avoided

24
Activation: Linear Function

• Linear function means that the output signal is proportional to the input signal to the neuron
ℝ! → ℝ!
§ If the value of the constant c is 1, it is
also called identity activation function
𝑓 𝑥 = 𝑐𝑥
§ This activation type is used in
regression problems
o E.g., the last layer can have linear
activation function, in order to output a
real number (and not a class
membership)

25
Fully Connected Layer

• A Fully Connected (FC) layer, also known as a dense layer, is a

type of layer used in artificial neural networks where each neuron
or node from the previous layer is connected to each neuron of
the current layer.
• It’s called “fully connected” because of this complete linkage. FC
layers are typically found towards the end of a neural network
architecture and are responsible for producing final output
predictions

26
Fully Connected Layer

Key Features:
• In CNNs, FC layers often come after the convolutional and pooling
layers. They are used to flatten the 2D spatial structure of the
data into a 1D vector and process this data for tasks like
classification.
• The number of neurons in the final FC layer usually matches the
number of output classes in a classification problem. For instance,
for a 10-class digit classification problem, there would be 10
neurons in the final FC layer, each outputting a score for one of
the classes.

27
Typical architecture

1. Input layer = image pixels

2. Convolution
3. ReLU Repeat one or more times
4. Pooling
5. One or more fully connected layers (+ReLU)
6. Final fully connected layer to get to the number of
classes we want
7. Softmax to get probability distribution over classes
28
CNN architectures and
applications

29
AlexNet

VGG

30
Inception /
GoogLeNet

ResNet

DenseNet

31
Large-scale CNNs with pre-trained weights
retrain

replace
output layer

extracted
features

• For many applications, an existing CNN can be re-used instead of training a

new model from scratch: extract features from suitable layer or
retrain the top layers with new data
• Keras contains several models trained with ImageNet:
• Xception, VGG16, VGG19, ResNet50, InceptionV3,
InceptionResNetV2, MobileNet, DenseNet, NASNet
Computer vision applications

Image credit: Li Fei-Fei et al

33
Image credit: Noh et al, Learning Deconvolution Network for Semantic Segmentation, ICCV 2015
Some selected applications

• Object detection: https://pjreddie.com/darknet/yolo/

• Semantic segmentation:
https://www.youtube.com/watch?v=qWl9idsCuLQ
• Human pose estimation:
https://www.youtube.com/watch?v=pW6nZXeWlGM
• Video recognition: https://valossa.com/
• Digital pathology: https://www.aiforia.com/

Backpropagation PDF 1644779488
No ratings yet
Backpropagation PDF 1644779488
8 pages
Deep Learning - Lecture 4 - CNNs
No ratings yet
Deep Learning - Lecture 4 - CNNs
53 pages
3 # Deep Learning
No ratings yet
3 # Deep Learning
36 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Comparative Study Between Vision Transformer and EfficientNet
No ratings yet
Comparative Study Between Vision Transformer and EfficientNet
5 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
DL Unit1 HD
No ratings yet
DL Unit1 HD
141 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
79 pages
Images, Neural Networks, CNNs
No ratings yet
Images, Neural Networks, CNNs
26 pages
6-DeepVisualLearning L6
No ratings yet
6-DeepVisualLearning L6
82 pages
Class Notes Unit 5
No ratings yet
Class Notes Unit 5
13 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
L5 Neural Network
No ratings yet
L5 Neural Network
67 pages
Figure PPT ch008
No ratings yet
Figure PPT ch008
46 pages
cs231n 2018 Lecture09
No ratings yet
cs231n 2018 Lecture09
106 pages
History
No ratings yet
History
75 pages
Unit 3
No ratings yet
Unit 3
105 pages
NEURAL NETWORKS AND DEEP LEARNING September-2020
No ratings yet
NEURAL NETWORKS AND DEEP LEARNING September-2020
1 page
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
DeepLearning Introduction
No ratings yet
DeepLearning Introduction
19 pages
DeepSeek图解10页
No ratings yet
DeepSeek图解10页
11 pages
Unit I Architecture of Neural Network
No ratings yet
Unit I Architecture of Neural Network
74 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
NN 07
No ratings yet
NN 07
24 pages
Convolutional Neural Networks in Python - DataCamp
No ratings yet
Convolutional Neural Networks in Python - DataCamp
22 pages
UNIT-III Convolution Neural Networks
No ratings yet
UNIT-III Convolution Neural Networks
9 pages
SDL Unit 2 3 4
No ratings yet
SDL Unit 2 3 4
12 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Lecture 3
No ratings yet
Lecture 3
48 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
116 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
Deep Learning Lecture 6
No ratings yet
Deep Learning Lecture 6
8 pages
DL CNN
No ratings yet
DL CNN
7 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Hot Chips Overview
No ratings yet
Hot Chips Overview
47 pages
CNN 2
No ratings yet
CNN 2
47 pages
Co2 CNN 3
No ratings yet
Co2 CNN 3
31 pages
Assignment 5 - Implementing Image Classification Using Deep Learning
No ratings yet
Assignment 5 - Implementing Image Classification Using Deep Learning
8 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
No ratings yet
Principles of Training Multi-Layer Neural Network Using Backpropagation
9 pages
Deep Learning: Seungsang Oh
No ratings yet
Deep Learning: Seungsang Oh
39 pages
CNN
No ratings yet
CNN
10 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
Livro 4 - Deep-Learning
No ratings yet
Livro 4 - Deep-Learning
271 pages
Convolutional Neural Networks : Covnets
No ratings yet
Convolutional Neural Networks : Covnets
22 pages
Additional CNN
No ratings yet
Additional CNN
82 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Module 05 CNN Arctitecture
No ratings yet
Module 05 CNN Arctitecture
7 pages
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
No ratings yet
Convolutional Neural Networks: Convolutions, Pooling and Cnns. Neural Architectures For Computer Vision
64 pages
Addernet: Do We Really Need Multiplications in Deep Learning?
No ratings yet
Addernet: Do We Really Need Multiplications in Deep Learning?
8 pages
Deep Learning Experiments
No ratings yet
Deep Learning Experiments
42 pages
Image Classification Using Convolutional Neural Networks (CNNS)
No ratings yet
Image Classification Using Convolutional Neural Networks (CNNS)
61 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
CNN
No ratings yet
CNN
31 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
Chapter 5 Deep Learning
No ratings yet
Chapter 5 Deep Learning
35 pages
CNN and RNN Comparative Study For Intrusion Detection System
No ratings yet
CNN and RNN Comparative Study For Intrusion Detection System
12 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
New
No ratings yet
New
8 pages
Ch-3 Convolutional Neural Networks (CNNS)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNS)
11 pages
Intro CNN PDF
No ratings yet
Intro CNN PDF
31 pages
Deep Learning Notes For Easy Access
No ratings yet
Deep Learning Notes For Easy Access
14 pages
Recurrent Neural Network
No ratings yet
Recurrent Neural Network
81 pages
What Is Perceptron - Simplilearn
No ratings yet
What Is Perceptron - Simplilearn
46 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
7 Applications of Convolutional Neural Networks - FWS
No ratings yet
7 Applications of Convolutional Neural Networks - FWS
3 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Step by Step Procedure That How I Resolve Given Task Pytorh
No ratings yet
Step by Step Procedure That How I Resolve Given Task Pytorh
6 pages
OmNarayanSingh CC306 IS Final
No ratings yet
OmNarayanSingh CC306 IS Final
15 pages
The Mcculloch Neuron (1943) : A B G B P W G A
No ratings yet
The Mcculloch Neuron (1943) : A B G B P W G A
33 pages
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
No ratings yet
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
20 pages
Back Propagation
No ratings yet
Back Propagation
10 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Perceptron Notes
No ratings yet
Perceptron Notes
4 pages
Super VIP Cheatsheet - Deep Learning
No ratings yet
Super VIP Cheatsheet - Deep Learning
47 pages
Convolutional Neural Networks For Visual Recognition
No ratings yet
Convolutional Neural Networks For Visual Recognition
45 pages
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet
Image Compression: Efficient Techniques for Visual Data Optimization
From Everand
Image Compression: Efficient Techniques for Visual Data Optimization
Fouad Sabry
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Images and Convolutional Neural Networks: Practical Deep Learning

Uploaded by

Images and Convolutional Neural Networks: Practical Deep Learning

Uploaded by

Images and convolutional

Practical deep learning

Computer vision = giving computers the ability to

An image has to be digitized for It is turned into millions of “pixel” elements

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922

0.49803922 0.49411765 0.4862745 0.47058824 0.49411765

0.5019608 0.49803922 0.49803922 0.49019608 0.50980395

0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

Picture source: https://pixabay.com/en/kitty-cat-kid-cat-domestic-cat-2948404/

0.49411765 0.49411765 0.4745098 0.49019608 0.4745098

0.49411765 0.49411765 0.5058824 0.49411765 0.49803922

0.50980395 0.5058824 0.52156866 0.50980395 0.5058824

● This is easy for humans

● Each output neuron connected to

Image from: http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

• Color image mini-batches are 4D tensors:

• Input: tensor of size N × Wi × Hi × Ci

(there is also Conv1D and Conv3D)

• Used to reduce the spatial resolution

18 Image from http://cs231n.github.io/convolutional-networks/

• Non-linear activations are needed to learn complex (non-linear)

Slide credit: Ismini Lourentzou – Introduction to Deep Learning 21

Slide credit: Ismini Lourentzou – Introduction to Deep Learning 22

§ Most modern deep NNs use ReLU

• The problem of ReLU activations: they can “die”

• Leaky ReLU activation function is a variant of ReLU

• A Fully Connected (FC) layer, also known as a dense layer, is a

1. Input layer = image pixels

• For many applications, an existing CNN can be re-used instead of training a

Image credit: Li Fei-Fei et al

• Object detection: https://pjreddie.com/darknet/yolo/

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.