0% found this document useful (0 votes)
23 views17 pages

Convolutional Neural Networks (CNN) : Convolutions

deep learning

Uploaded by

kmedo8080966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views17 pages

Convolutional Neural Networks (CNN) : Convolutions

deep learning

Uploaded by

kmedo8080966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

lOMoARcPSD|15872722

Convolutional Neural Networks


(CNN)
CNNs are state of the art for image processing / DL with images as input

applying filters on the input data, basically, the filters replace the weights through
different kernel convolutional operations being responsible for the filter effect

two main ideas:

give a better structure to NN: instead of connecting everything with


everything, connect neurons of one layer with neurons of another layer that
are neighbors

use the same weights for different parts of the image; intuitively if feature of
one image is interesting it will prob. also be interesting in another image

Convolutions
convolve = falten; applying a filter to a function; filter in the sense of a matrix/grid
of values that alter the output of a given function

Discrete Case: Box Filter

Sliding filter kernel from left to right, multiplying and summing up every overlapping fields

applying the same filter to all pixels of an image is the idea of weight sharing

handling overlapping fields: either ignore → shrinking image; or padding: add


0 to compute a value

Convolutional Neural Networks (CNN) 1

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Convolution on Images

Image of 5x5 with a convolutional filter of size 3x3 generating an output of size 3x3

example computation is reduced to actually necessary computations: 3 ⋅


0 + 3 ⋅ (−1) + 5 ⋅ 0 + 1 ⋅ (−1) + 4 ⋅ 5 + 4 ⋅ (−1) + 7 ⋅ 0 + 9 ⋅
(−1) + (−1) ⋅ 0 = 3 ⋅ (−1) + 1 ⋅ (−1) + 4 ⋅ 5 + 4 ⋅ (−1) + 9 ⋅
(−1) = −3 − 1 + 20 − 4 − 9 = 20 − 17 = 3
Image Filter Examples → that is exactly how filters are applied by any image
altering application, e.g. Instagram

Convolutional Neural Networks (CNN) 2

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

In CNNs these filters represent the weights of the network

Convolutions on RGB Images

images have depth due to RGB split we have 3 channels

depth dimension of image must match depth of filter (convolutional kernel)

same procedure as before: slide filter over image and apply filter through dot
product at every position resulting in zi = wT xi + b
(5×5×3)×1(5×5×3)×1 1
where the weights represent the filter, note that the output matrix z is of
dimension 1

Example: 32 x 32 x 3 image results in 28 x 28 output image without padding

Convolutional Neural Networks (CNN) 3

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Convolution Layer
def.: applying different filters to the same image, for every filter we apply to the
image we create a new convolutional layer, e.g. applying 2 filters to an 32 x 32 x
3 image results in 28 x 28 x 2 convolutional layers

layer defined by filter width & height, depth implicitly given by dot-product

number of layers defined by number of different weights (i.e. filters)

each filter captures different image characteristic, e.g. horizontal/vertical edges,


circles, squares, etc.

Dimensions of Convolutional Layers - Examples

stride trigger a jump, e.g. stride = 2

Convolutional Neural Networks (CNN) 4

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

without padding the outputs shrink with every iteration which is not a good
idea

padding assures that corner pixels are considered as well and image sizes
don't get smaller as quickler as they would otherwise → most common
paddiong: zero-padding, leading to output size: (+ N +2⋅P
S
−F
, + 1) ×
(+ N +2⋅P
S
−F
, + 1)
N: width of image

F: width of filter
F −1
P: number of padding; padding should usually be set to P = 2
S: stride

number of parameters (weights): each number in filter is considered as


one weight, i.e. 5x5x3 filter has 5*5*3+1 = 76 parameters (+1 for bias for
every layer), if we apply 10 filters we have a total of 76 * 10 = 760
parameters

Exam Example Question

Convolutional Neural Networks (CNN) 5

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Convolutional Neural Network (CNN)


concatenation of convolutional layers and activations

Pooling
another operator heavily used in CNNs

using padding assures that the images don't shrink as we apply the filters,
pooling allows to shrink images nevertheless but only when required → reducing
feature map size

pooling is the same as downsampling usually by 2

Different ways:

Max Pooling: define equally sized regions within input and then create new
pooled output of that size consisting of highest numbers from each
corresponding input region, e.g.

Convolutional Neural Networks (CNN) 6

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

if within a region more than one highest number exist just take either

Average Pooling: averaging all values of a region instead of taking max value

conv layer = feature extraction computing feature in a given region and pooling
layer = feature selection picking the strongest activation in a region

most common setting of a pool: 2 x 2, e.g. image of 200x200 results in 100x100

Other properties

Example of a fully connected network using convolution, ReLU as activation


function and applying Pooling to shrink the image size

Convolutional Neural Networks (CNN) 7

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

CNN Prototype; FC applies brute force connecting everything with everything, not using shared
weights and thus not applying inductive bias

Convolutions allows us to structure a Neural Network

Receptive Field
describing the field of pixels from which a pixel of field within a convolutional
kernel has been created (computed through dot products) from

the deeper one goes into a network, the bigger the receptive field must be

preferably, use more layers with smaller filters (e.g. 3 layers with filter size 3x3)
as this also injects more non-linearity (with every additional layer), also less
weights → less overfitting

Convolutional Neural Networks (CNN) 8

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Classic Architectures
LeNet

32x32x1 image recognition of grey scale images therefore only 1 as 3rd


dimension classifying into 10 classes

on a high level: gradually reduce spatial dimensions

Test Benchmarks: ImageNet Dataset - ImageNet Large Scale Visual


Recognition Competition marked key milestone in DL

Common Performance Metrics using top-k scores

top-1 score: checking if sample's top class with highest probability is the
same as target label

top-5 score: if any of 5 predictions with highest prob → top-5 error


percentage of test samples for which correct class wasn't in top 5
predicted classes

AlexNet has about 60 mio parameters

Convolutional Neural Networks (CNN) 9

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

1000 outputs for 1000 classes: in order to get from spatial data 6x6x256 we
use fully connected networks converting data into 9216 data points, then
4096, again 4096, and finally 1000

VGGNet simplifying AlexNet by fixing CONV = 3x3 filters with stride 1 &
MAXPOOL = 2x2 filters with stride 2

again switching between CONV & POOL in 16 layers, again width & height
decreases + # of filters increase as we go deeper resulting in 138 mio
parameters

Skip Connections - ResNet


Problem of Depth - why don't we simply add more layers? → more and more
layers makes training harder, gradies explode and vanish

Convolutional Neural Networks (CNN) 10

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Residual Block - how can we train very deep nets (i.e. more layers) while
keeping training stable?

skipping connection: taing output from L-1 directly to L+1

ResNet Block

Convolutional Neural Networks (CNN) 11

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

ResNets with set of good network design choices - mostly used for computer
vision networks to classify images

Why do ResNets work?

Convolutional Neural Networks (CNN) 12

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

if these value become 0 the output of layer L+1 will be equal to L-1, nothing
changes because gradients vanish → reason why we cant have unlimited
main layers

1x1 Convolutions
simply scales input answer by constant while keeping dimension of input

useful to shrink number of channels + adds non-linearity allowing us to learn


more complex functions

Inception Layer
core idea: too many layers result in huge computational costs, reduce # of layers
with 1x1 convolution

finding the perfect number of filters → choose them all: same convolutions with
different sizes + 3x3 max pooling with stride 1

Convolutional Neural Networks (CNN) 13

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Computational Cost: inserting layer of 32x32x16 saves a lot of computational


effort

GoogLeNet using inception blocks with extra added max pool layer to reduce
dimensionality

Xception Net being extrem version of inception applying Depthwise Separable


Convolutions instead of normal convolutions, 36 conv layers structured into
several modules with skip connections

depthwise separable convolutions using different filters for each slide of


depth 3 → reduces # of computations significantly

Convolutional Neural Networks (CNN) 14

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Fully Convolutional Network


convolutions act as feature extraction methods

fully connected convolutional network assures in the last few layers to take
activation/feature maps and turn the information into a classification result

converting fully connected layers also to convolutional layers using 1x1


convolution as it is exactly the same as the fully connected network layers

using bigger images is not a problem resulting in (H/32 x W/32 x # of channels)

Convolutional Neural Networks (CNN) 15

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Semantic Segmentation: reduce dimension of information and increase it back to


original size of image in the last layer/step → How to upsample / go back to
original size?

interpolation - double up size, e..g nearest neighbot interpolation (pixel


without value looking at nearest neighbouring pixel and copying its value),
bilinear interpolation (looking at different neighbours taking weighted average
of their values), bicubic interpolation (again taking values from neighbours)

transposed conv: taking representation, blowing it up by spreading given


information equally across new spatial dimension, processing representation
by series of convolutions

performing unpooling

initializing all empty spaced to 0, then continuing with convolutions to


adjust the 0 values

U-Net

from left (contraction path, i.e. encoder) to right (expansion path, i.e. decoder)
performing series of convolutions (feature extraction) and pooling (feature
selection) → during encoding we loose spatial detail, therefore results copied to
decoder such that it also has the previous information

Convolutional Neural Networks (CNN) 16

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)


lOMoARcPSD|15872722

Convolutional Neural Networks (CNN) 17

Downloaded by Eng Esraa (esraahassan.esraa@gmail.com)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy