0% found this document useful (0 votes)
9 views36 pages

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

ivan.ukhov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

Convolutional Networks: Neural Networks With Applications To Vision and Language

Uploaded by

ivan.ukhov
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Lecture 3:

Convolutional Networks
Neural Networks with Applications to Vision and Language
Michael Felsberg Marco Kuhlmann
Computer Vision Laboratory Natural Language Processing Lab
Department of Electrical Engineering Department of Computer Science
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 2

Convolution (neural) networks


• CNN [LeCun, 1989]
• suitable for data with known, grid-like topology
– time series
– images

“Convolutional networks are simply neural


networks that use convolution in place of
general matrix multiplication in at least
one of their layers.”
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 3

Data types
Single channel Multi-channel
1D Audio waveform State vector of
animated stick man
2D Phase-space of Color image
audio signal (time
and frequency axis)
3D Density data from Color video
CT-scan

• input and output can be of fixed or variable size


• pooling layer enables transitions, cf. example p 16
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 4

Convolution

http://bmia.bmt.tue.nl/education/courses/fev/course/notebooks/Convolution.html
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 5

Comments
• convolution is much more general
– on any fields, no just real numbers
– on any dimensionality, not just 1D: tensors
– also on non-flat domains, e.g. spheres
– not just shifts, e.g. rotations
• terminology: filter (-mask), impulse response, kernel
• output: response, feature map
• commutative, related to cross-correlation by flipping
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 6

Comments
• the linear space of integrable functions with the product
given by convolution is a commutative algebra
• known from signal processing: filter banks (analyzing and
synthesizing) – subspace projections
• dimensionality examples
– color images: 3D tensors (2 spatial coordinates, colors)
– batch: 4D tensors (4th: example index)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 7

Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 8

Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 9

Motivation CNNs

1. sparse (and local) interaction


2. parameter sharing
3. equivariant representations
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 10

Sparse (and local) interaction


• kernel smaller than the input

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 11

Sparse (and local) interaction


• kernel smaller than the input
– fewer parameters
– lower memory
requirements
– better statistical
efficiency
– fewer operations
• by increased depth indirectly
connected to all input
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 12

Parameter sharing
• tied weights
• reduced storage
requirements
• but same time
complexity
• sometimes sharing
should be limited,
e.g. cropped images

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 13

Equivariant representations
• Invariance (under operations g) is a property

• Equivariance is a property

• easy for discrete shift operations


• more involved for sub-pixel shift and rotation
• tricky for scaling
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 14

Layers in CNNs
• each layer consists of
three stages:
1. convolutions to
compute linear
activation
2. detector stage with
rectified linear
activation
3. pooling function

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 15

Pooling
• summary statistics of nearby outputs
– max pooling [Zhou&Chellappa, 1988]
maximum output in rectangular region
– average in rectangular region
– L2 norm of rectangular region
– weighted average
(based on distance from central position)
• approximately invariant to small translations
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 16

Pooling and invariance

infinitely strong prior


risk for underfitting
induces topological knowledge
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 17

Strides
• pooling s pixels apart instead of every pixel (stride s)

http://www.deeplearningbook.org/

– improved statistical efficiency


– reduced memory requirements
– handling inputs of varying size
– but: pooling & strides complicate top-down
processing (e.g. autoencoders)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 18

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 19

Spatio-featural uncertainty

• Fundamental question:
what are the resolution layer 3
limits?
• Uncertainty relation
[Felsberg, 2009] layer 2

layer 1
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 20

Mathematical formulations
• 3D observed data V, 3D output Z, 4D kernel K

• Introduce stride s (downsampling), Z = c(K,V,s)

• Note that V needs to be zero-padded (size of


convolution output: valid/same/full)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 21

Stride vs.
sequential
convolution and
downsampling
(cf. filterbanks/
wavelets)

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 22

Zero-padding

http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 23

Mathematical formulations contd.


• Unshared convolution – shift variant 6D kernel W

• Example: face detection (specific positions of eyes etc.)


• Tiled convolution – set of 4D kernels Ku,v (% modulo)

• Learning of invariance (see e.g. example ‘5’)


M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 24

Overview of options
local connections unshared local connections shared

local connections tiled full connections


http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 25

Operations for learning


Assume some loss function J(V,K) to be minimized
1. Gradient with respect to the kernel (backprop from
output to weights)

2. Gradient with respect to the input (backprop from


output to inputs)

Elementary ingredient: transpose of the forward


operator (after flattening the input)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 26

Parenthesis: optimization
• Minimize general objective that depends on Z

where the tilde indicates a suitable reshape


• Derivatives of J include (chain rule)

• Output of transpose depends on zero-padding and stride


• Relation to PCA (for autoencoders) applies strictly only
for matrices with orthonormal rows
• Indices change semantics (but we stick to book)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 27

Technical details for backprop


• Assume strided convolution Z = c(K,V,s)

• Given in each step: tensor G


• Using this tensor, we obtain

• Note that c, g, h are linear in K, V, G!


M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 28

Autoencoders and reconstruction


• Use h() for generating the transpose of c()
• Assume hidden units H replacing G and compute
~
approximation of V (with objective J(K,H))
• Train the autoencoder, receive

• Train decoder

• Train encoder
• Equalities are obtained exploiting linearity of c, g, h
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 29

Bias terms
• Locally connected unshared – each unit own bias
• Tiled convolution – share biases in tiling pattern
• Shared convolution
– share bias
– separate bias at each location
compensate differences in the image statistics
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 30

Structured output
Tasks addressed
1. Classification – class label
2. Regression – real value(s)
Alternative: structured object as output
– segmentation
– pixel-wise labelling
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 31

How to achieve structured output?


• Avoid pooling – pixel dense
• Emit lower-resolution grid of labels
• Pooling with stride 1
• Repeating a refinement step
(recurrent network)
• Use of graphical models
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 32

Making convolution efficient


• Parallel computation resources (GPU)
• Clever convolution algorithm
– Fourier transform (point-wise multiplication)
– Separability (sequence of 1D convolutions reduces
complexity and parameters)
• Deployment of network more relevant than training
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 33

Unsupervised or random features


• Most expensive: learning features
• Three strategies to avoid supervised training
– random features (choice of architecture)
– hand-designed features
– unsupervised training of features (determine
features separately from classification layer)
• Approximative strategy: greedy layer-wise pretraining
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 34

The link to neuroscience


• The work by Hubel&Wiesel opened up for the analysis of
V1 (primal visual cortex)
– spatial map with 2D structure ~ 2D feature maps
– simple cells ~ linear function of spatial region
– complex cells ~ pooling units with invariance
• Current CNNs span pathway retina-LGN-V1-V2-V4-IT
• BUT: brain uses top-down feedback
• Also: foveal resolution and saccades without counterpart
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 35

Reverse correlation and Gabor functions


• Stimulus: noise
• Linear model of responses: approximate neuron weights
• Often identified with Gabor functions (complex wavelet
varying coordinate system, scales, frequency, and phase)

http://www.deeplearningbook.org/
• “bad sign” if CCN does not learn some edge detector
Michael Felsberg
michael.felsberg@liu.se

www.liu.se

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy