Convolutional Networks: Neural Networks With Applications To Vision and Language
Convolutional Networks: Neural Networks With Applications To Vision and Language
Convolutional Networks
Neural Networks with Applications to Vision and Language
Michael Felsberg Marco Kuhlmann
Computer Vision Laboratory Natural Language Processing Lab
Department of Electrical Engineering Department of Computer Science
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 2
Data types
Single channel Multi-channel
1D Audio waveform State vector of
animated stick man
2D Phase-space of Color image
audio signal (time
and frequency axis)
3D Density data from Color video
CT-scan
Convolution
http://bmia.bmt.tue.nl/education/courses/fev/course/notebooks/Convolution.html
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 5
Comments
• convolution is much more general
– on any fields, no just real numbers
– on any dimensionality, not just 1D: tensors
– also on non-flat domains, e.g. spheres
– not just shifts, e.g. rotations
• terminology: filter (-mask), impulse response, kernel
• output: response, feature map
• commutative, related to cross-correlation by flipping
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 6
Comments
• the linear space of integrable functions with the product
given by convolution is a commutative algebra
• known from signal processing: filter banks (analyzing and
synthesizing) – subspace projections
• dimensionality examples
– color images: 3D tensors (2 spatial coordinates, colors)
– batch: 4D tensors (4th: example index)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 7
Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 8
Algorithmic
• flipping irrelevant for
learned coefficients
• 1D convolution:
Toeplitz matrix
• 2D convolution:
doubly block circulant
• sparse
• boundary conditions
(valid, reflective,
periodic, zeros)
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 9
Motivation CNNs
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 11
Parameter sharing
• tied weights
• reduced storage
requirements
• but same time
complexity
• sometimes sharing
should be limited,
e.g. cropped images
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 13
Equivariant representations
• Invariance (under operations g) is a property
• Equivariance is a property
Layers in CNNs
• each layer consists of
three stages:
1. convolutions to
compute linear
activation
2. detector stage with
rectified linear
activation
3. pooling function
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 15
Pooling
• summary statistics of nearby outputs
– max pooling [Zhou&Chellappa, 1988]
maximum output in rectangular region
– average in rectangular region
– L2 norm of rectangular region
– weighted average
(based on distance from central position)
• approximately invariant to small translations
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 16
Strides
• pooling s pixels apart instead of every pixel (stride s)
http://www.deeplearningbook.org/
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 19
Spatio-featural uncertainty
• Fundamental question:
what are the resolution layer 3
limits?
• Uncertainty relation
[Felsberg, 2009] layer 2
layer 1
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 20
Mathematical formulations
• 3D observed data V, 3D output Z, 4D kernel K
Stride vs.
sequential
convolution and
downsampling
(cf. filterbanks/
wavelets)
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 22
Zero-padding
http://www.deeplearningbook.org/
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 23
Overview of options
local connections unshared local connections shared
Parenthesis: optimization
• Minimize general objective that depends on Z
• Train decoder
• Train encoder
• Equalities are obtained exploiting linearity of c, g, h
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 29
Bias terms
• Locally connected unshared – each unit own bias
• Tiled convolution – share biases in tiling pattern
• Shared convolution
– share bias
– separate bias at each location
compensate differences in the image statistics
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 30
Structured output
Tasks addressed
1. Classification – class label
2. Regression – real value(s)
Alternative: structured object as output
– segmentation
– pixel-wise labelling
M. Felsberg: Neural Networks with Applications to Vision and Language / Convolutional Networks 31
http://www.deeplearningbook.org/
• “bad sign” if CCN does not learn some edge detector
Michael Felsberg
michael.felsberg@liu.se
www.liu.se