0% found this document useful (0 votes)

9 views36 pages

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

ivan.ukhov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views36 pages

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

ivan.ukhov

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Lecture 3:

Convolutional Networks
Neural Networks with Applications to Vision and Language
Michael Felsberg Marco Kuhlmann
Computer Vision Laboratory Natural Language Processing Lab
Department of Electrical Engineering Department of Computer Science
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 2

Convolution (neural) networks

• CNN [LeCun, 1989]
• suitable for data with known, grid-like topology
– time series
– images

“Convolutional networks are simply neural

networks that use convolution in place of
general matrix multiplication in at least
one of their layers.”
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 3

Data types
Single channel Multi-channel
1D Audio waveform State vector of
animated stick man
2D Phase-space of Color image
audio signal (time
and frequency axis)
3D Density data from Color video
CT-scan

• input and output can be of fixed or variable size

• pooling layer enables transitions, cf. example p 16
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 4

Convolution

http://bmia.bmt.tue.nl/education/courses/fev/course/notebooks/Convolution.html
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 5

Comments
• convolution is much more general
– on any fields, no just real numbers
– on any dimensionality, not just 1D: tensors
– also on non-flat domains, e.g. spheres
– not just shifts, e.g. rotations
• terminology: filter (-mask), impulse response, kernel
• output: response, feature map
• commutative, related to cross-correlation by flipping
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 6

Comments
• the linear space of integrable functions with the product
given by convolution is a commutative algebra
• known from signal processing: filter banks (analyzing and
synthesizing) – subspace projections
• dimensionality examples
– color images: 3D tensors (2 spatial coordinates, colors)
– batch: 4D tensors (4th: example index)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 7

Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 8

Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 9

Motivation CNNs

1. sparse (and local) interaction

2. parameter sharing
3. equivariant representations
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 10

Sparse (and local) interaction

• kernel smaller than the input

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 11

Sparse (and local) interaction

• kernel smaller than the input
– fewer parameters
– lower memory
requirements
– better statistical
efficiency
– fewer operations
• by increased depth indirectly
connected to all input
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 12

Parameter sharing
• tied weights
• reduced storage
requirements
• but same time
complexity
• sometimes sharing
should be limited,
e.g. cropped images

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 13

Equivariant representations
• Invariance (under operations g) is a property

• Equivariance is a property

• easy for discrete shift operations

• more involved for sub-pixel shift and rotation
• tricky for scaling
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 14

Layers in CNNs
• each layer consists of
three stages:
1. convolutions to
compute linear
activation
2. detector stage with
rectified linear
activation
3. pooling function

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 15

Pooling
• summary statistics of nearby outputs
– max pooling [Zhou&Chellappa, 1988]
maximum output in rectangular region
– average in rectangular region
– L2 norm of rectangular region
– weighted average
(based on distance from central position)
• approximately invariant to small translations
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 16

Pooling and invariance

infinitely strong prior

risk for underfitting
induces topological knowledge
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 17

Strides
• pooling s pixels apart instead of every pixel (stride s)

http://www.deeplearningbook.org/

– improved statistical efficiency

– reduced memory requirements
– handling inputs of varying size
– but: pooling & strides complicate top-down
processing (e.g. autoencoders)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 18

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 19

Spatio-featural uncertainty

• Fundamental question:
what are the resolution layer 3
limits?
• Uncertainty relation
[Felsberg, 2009] layer 2

layer 1
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 20

Mathematical formulations
• 3D observed data V, 3D output Z, 4D kernel K

• Introduce stride s (downsampling), Z = c(K,V,s)

• Note that V needs to be zero-padded (size of

convolution output: valid/same/full)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 21

Stride vs.
sequential
convolution and
downsampling
(cf. filterbanks/
wavelets)

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 22

Zero-padding

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 23

Mathematical formulations contd.

• Unshared convolution – shift variant 6D kernel W

• Example: face detection (specific positions of eyes etc.)

• Tiled convolution – set of 4D kernels Ku,v (% modulo)

• Learning of invariance (see e.g. example ‘5’)

M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 24

Overview of options
local connections unshared local connections shared

local connections tiled full connections

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 25

Operations for learning

Assume some loss function J(V,K) to be minimized
1. Gradient with respect to the kernel (backprop from
output to weights)

2. Gradient with respect to the input (backprop from

output to inputs)

Elementary ingredient: transpose of the forward

operator (after flattening the input)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 26

Parenthesis: optimization
• Minimize general objective that depends on Z

where the tilde indicates a suitable reshape

• Derivatives of J include (chain rule)

• Output of transpose depends on zero-padding and stride

• Relation to PCA (for autoencoders) applies strictly only
for matrices with orthonormal rows
• Indices change semantics (but we stick to book)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 27

Technical details for backprop

• Assume strided convolution Z = c(K,V,s)

• Given in each step: tensor G

• Using this tensor, we obtain

• Note that c, g, h are linear in K, V, G!

M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 28

Autoencoders and reconstruction

• Use h() for generating the transpose of c()
• Assume hidden units H replacing G and compute
~
approximation of V (with objective J(K,H))
• Train the autoencoder, receive

• Train decoder

• Train encoder
• Equalities are obtained exploiting linearity of c, g, h
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 29

Bias terms
• Locally connected unshared – each unit own bias
• Tiled convolution – share biases in tiling pattern
• Shared convolution
– share bias
– separate bias at each location
compensate differences in the image statistics
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 30

Structured output
Tasks addressed
1. Classification – class label
2. Regression – real value(s)
Alternative: structured object as output
– segmentation
– pixel-wise labelling
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 31

How to achieve structured output?

• Avoid pooling – pixel dense
• Emit lower-resolution grid of labels
• Pooling with stride 1
• Repeating a refinement step
(recurrent network)
• Use of graphical models
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 32

Making convolution efficient

• Parallel computation resources (GPU)
• Clever convolution algorithm
– Fourier transform (point-wise multiplication)
– Separability (sequence of 1D convolutions reduces
complexity and parameters)
• Deployment of network more relevant than training
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 33

Unsupervised or random features

• Most expensive: learning features
• Three strategies to avoid supervised training
– random features (choice of architecture)
– hand-designed features
– unsupervised training of features (determine
features separately from classification layer)
• Approximative strategy: greedy layer-wise pretraining
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 34

The link to neuroscience

• The work by Hubel&Wiesel opened up for the analysis of
V1 (primal visual cortex)
– spatial map with 2D structure ~ 2D feature maps
– simple cells ~ linear function of spatial region
– complex cells ~ pooling units with invariance
• Current CNNs span pathway retina-LGN-V1-V2-V4-IT
• BUT: brain uses top-down feedback
• Also: foveal resolution and saccades without counterpart
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 35

Reverse correlation and Gabor functions

• Stimulus: noise
• Linear model of responses: approximate neuron weights
• Often identified with Gabor functions (complex wavelet
varying coordinate system, scales, frequency, and phase)

http://www.deeplearningbook.org/
• “bad sign” if CCN does not learn some edge detector
Michael Felsberg
michael.felsberg@liu.se

www.liu.se

(Step-Up) Samir S. Shah, Brian Alverson, Jeanine Ronan - Step-Up To Pediatrics-LWW (2013)
100% (5)
(Step-Up) Samir S. Shah, Brian Alverson, Jeanine Ronan - Step-Up To Pediatrics-LWW (2013)
607 pages
BMM 2018 - Deep Learning Tutorial
No ratings yet
BMM 2018 - Deep Learning Tutorial
47 pages
L11 Learning III Neural Network Architectures
No ratings yet
L11 Learning III Neural Network Architectures
35 pages
Lec6 RNN Attention Search
No ratings yet
Lec6 RNN Attention Search
62 pages
Convolutional Neural Networks Notes
No ratings yet
Convolutional Neural Networks Notes
29 pages
Introduction To Deep Learning: Nandita Bhaskhar
No ratings yet
Introduction To Deep Learning: Nandita Bhaskhar
56 pages
Convolutional Neural Networks
100% (1)
Convolutional Neural Networks
31 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Images and Convolutional Neural Networks: Practical Deep Learning
No ratings yet
Images and Convolutional Neural Networks: Practical Deep Learning
34 pages
Lecture2.2 UnimodalRepresentations Part1 PDF
No ratings yet
Lecture2.2 UnimodalRepresentations Part1 PDF
92 pages
ML 11
No ratings yet
ML 11
62 pages
Convolutional Neural Networks: Jianxin Wu
No ratings yet
Convolutional Neural Networks: Jianxin Wu
35 pages
L09-10 DL and CNN
No ratings yet
L09-10 DL and CNN
56 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Introduction To Deep Learning 17th January 2025
No ratings yet
Introduction To Deep Learning 17th January 2025
60 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
No ratings yet
Introduction To Deep Learning: TA: Drew Hudson May 8, 2020
33 pages
AE556 2024 Topic4 CNN
No ratings yet
AE556 2024 Topic4 CNN
26 pages
L10-DL Intro
No ratings yet
L10-DL Intro
25 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
EvolvingCNNs V1
No ratings yet
EvolvingCNNs V1
42 pages
Lecture 3 V33
No ratings yet
Lecture 3 V33
52 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
21CS743 Module4 Notes
No ratings yet
21CS743 Module4 Notes
15 pages
1.neural Networks and Convolutional Processing
No ratings yet
1.neural Networks and Convolutional Processing
94 pages
Lec14 CNNRNNModels
No ratings yet
Lec14 CNNRNNModels
64 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
21CS743 DL Module4 Notes
No ratings yet
21CS743 DL Module4 Notes
7 pages
Antim Prahar AI and ML For Business 2025
No ratings yet
Antim Prahar AI and ML For Business 2025
45 pages
DL Ia2
No ratings yet
DL Ia2
13 pages
CNN 2
No ratings yet
CNN 2
47 pages
Cs329-Lecture 5-2025
No ratings yet
Cs329-Lecture 5-2025
30 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Convnets
No ratings yet
Convnets
41 pages
Kernel Slides
No ratings yet
Kernel Slides
33 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Module 3
No ratings yet
Module 3
67 pages
Unit 3
No ratings yet
Unit 3
105 pages
Week 8
No ratings yet
Week 8
101 pages
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
NN 06
No ratings yet
NN 06
18 pages
Neural Network Architectures
No ratings yet
Neural Network Architectures
32 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
CV Mot
No ratings yet
CV Mot
69 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
FT04 Haghighat Independent 2023
No ratings yet
FT04 Haghighat Independent 2023
40 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
79 pages
7 CNN
No ratings yet
7 CNN
66 pages
Convolutional Networks
No ratings yet
Convolutional Networks
37 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
DL 4
No ratings yet
DL 4
5 pages
Introduction To Convolutional Neural Networks1-Unit3
No ratings yet
Introduction To Convolutional Neural Networks1-Unit3
10 pages
Anthony
No ratings yet
Anthony
33 pages
DSA5102 Lecture5
No ratings yet
DSA5102 Lecture5
45 pages
Convolutional Neural Networks - Deeplearning-Notes
No ratings yet
Convolutional Neural Networks - Deeplearning-Notes
43 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Error-Correction on Non-Standard Communication Channels
From Everand
Error-Correction on Non-Standard Communication Channels
Edward A. Ratzer
No ratings yet
Direct Instruction Lesson Plan Template
No ratings yet
Direct Instruction Lesson Plan Template
4 pages
Gold Exp B1 TB Flip
No ratings yet
Gold Exp B1 TB Flip
2 pages
The Concept of Competitive Advantages. Logic, Sources and Durability
No ratings yet
The Concept of Competitive Advantages. Logic, Sources and Durability
14 pages
From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
No ratings yet
From Classical To Unsupervised Deep Learning For Solving Inverse Problem in Imaging To
248 pages
SESlides 5
No ratings yet
SESlides 5
21 pages
Under The Influence of Social Media
No ratings yet
Under The Influence of Social Media
18 pages
Priyanka Athia Resume
No ratings yet
Priyanka Athia Resume
2 pages
CV Prem ICU RPH Training
No ratings yet
CV Prem ICU RPH Training
5 pages
The Principal, H.P. Govt. Dental College, Shimla
No ratings yet
The Principal, H.P. Govt. Dental College, Shimla
15 pages
Course 3 Syllabus 2024 2025
No ratings yet
Course 3 Syllabus 2024 2025
2 pages
2024-2025 Spring Gamma Writing - Describing Graphs I - SC
No ratings yet
2024-2025 Spring Gamma Writing - Describing Graphs I - SC
7 pages
Life Skills Lesson Plans Grade 3 Week 5 2
No ratings yet
Life Skills Lesson Plans Grade 3 Week 5 2
3 pages
Systemic Coaching Delivering Value Beyond The Individual 1st Edition ISBN 1138322482, 9781138322486 Google Drive Download
No ratings yet
Systemic Coaching Delivering Value Beyond The Individual 1st Edition ISBN 1138322482, 9781138322486 Google Drive Download
17 pages
Introduction To Anthropology
No ratings yet
Introduction To Anthropology
4 pages
Nsic - (Ipt-Internship 2021-2022)
No ratings yet
Nsic - (Ipt-Internship 2021-2022)
2 pages
WRIT 200 Course Outline LAS 2014-2015
No ratings yet
WRIT 200 Course Outline LAS 2014-2015
10 pages
Kids 2 P1
No ratings yet
Kids 2 P1
2 pages
Jaea Davidson Resume
No ratings yet
Jaea Davidson Resume
2 pages
SOCI
No ratings yet
SOCI
14 pages
AdvDip Fire Safety Engineering
No ratings yet
AdvDip Fire Safety Engineering
2 pages
Variations in Psychological Attributes
100% (2)
Variations in Psychological Attributes
43 pages
Annexure 1. A. 5 Board Results Achievement Circular CISCE - 24-25 - Grade 10 - 12
No ratings yet
Annexure 1. A. 5 Board Results Achievement Circular CISCE - 24-25 - Grade 10 - 12
5 pages
Fact Sheet Scietech - English and Filipino Article 1
90% (10)
Fact Sheet Scietech - English and Filipino Article 1
2 pages
Achieving Goals
No ratings yet
Achieving Goals
19 pages
MGT 452 Final Paper
No ratings yet
MGT 452 Final Paper
4 pages
Anthropology Natural Selection Lab Report Final
No ratings yet
Anthropology Natural Selection Lab Report Final
11 pages
Camryn Yuen Resume
No ratings yet
Camryn Yuen Resume
2 pages
Psych
No ratings yet
Psych
18 pages
Mahamed Ezzeldeen - CV (GUC)
No ratings yet
Mahamed Ezzeldeen - CV (GUC)
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

Lecture 3:

Convolution (neural) networks

“Convolutional networks are simply neural

• input and output can be of fixed or variable size

1. sparse (and local) interaction

Sparse (and local) interaction

Sparse (and local) interaction

• easy for discrete shift operations

Pooling and invariance

infinitely strong prior

– improved statistical efficiency

• Introduce stride s (downsampling), Z = c(K,V,s)

• Note that V needs to be zero-padded (size of

Mathematical formulations contd.

• Example: face detection (specific positions of eyes etc.)

• Learning of invariance (see e.g. example ‘5’)

local connections tiled full connections

Operations for learning

2. Gradient with respect to the input (backprop from

Elementary ingredient: transpose of the forward

where the tilde indicates a suitable reshape

• Output of transpose depends on zero-padding and stride

Technical details for backprop

• Given in each step: tensor G

• Note that c, g, h are linear in K, V, G!

Autoencoders and reconstruction

How to achieve structured output?

Making convolution efficient

Unsupervised or random features

The link to neuroscience

Reverse correlation and Gabor functions

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.