0% found this document useful (0 votes)
29 views219 pages

Unit 5

Uploaded by

dasaditi2312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views219 pages

Unit 5

Uploaded by

dasaditi2312
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 219

Deep Learning

Approach for
Image Processing

UNIT-5
Artificial Neural Networks
 Computational models inspired by the human
brain:
 Algorithms that try to mimic the brain.

 Massively parallel, distributed system, made up of


simple processing units (neurons)

 Synaptic connection strengths among neurons are


used to store the acquired knowledge.

 Knowledge is acquired by the network from its


environment through a learning process
History
 late-1800's - Neural Networks appear as an
analogy to biological systems
 1960's and 70's – Simple neural networks appear
 Fall out of favor because the perceptron is not
effective by itself, and there were no good algorithms
for multilayer nets
 1986 – Backpropagation algorithm appears
 Neural Networks have a resurgence in popularity
 More computationally expensive
Biological Neuron

A variety of different neurons exist (motor


neuron, on-center off-surround visual cells…),
with different branching structures.

The connections of the network and the


strengths of the individual synapses establish
the function of the network.
Biological Neuron
Properties of ANNs
Properties
 Inputs are flexible
 any real values
 Highly correlated or independent
 Target function may be discrete-valued, real-valued, or
vectors of discrete or real values
 Outputs are real numbers between 0 and 1
 Resistant to errors in the training data
 Long training time
 Fast evaluation
 The function produced can be difficult for humans to
interpret
When to consider neural networks
 Input is high-dimensional discrete or raw-valued
 Output is discrete or real-valued
 Output is a vector of values
 Possibly noisy data
 Form of target function is unknown
 Human readability of the result is not important
Examples:
 Speech phoneme recognition
 Image classification
 Financial prediction
Perceptron
 Basic unit in a neural network
 Linear separator
 Parts
 N inputs, x1 ... xn
 Weights for each input, w1 ... wn
 A bias input x0 (constant) and associated weight w0
 Weighted sum of inputs, y = w0x0 + w1x1 + ... + wnxn
 A threshold function or activation function,
 i.e 1 if y > t, -1 if y <= t
A Neuron (= a perceptron)
- t
x0 w0
x1

w1
f
output y
xn wn
For Exampl e
n
Input weight weighted Activation y  sign(  wi xi  t )
vector x vector w sum function i 0

 The n-dimensional input vector x is mapped into variable y by


means of the scalar product and a nonlinear function mapping

10
Artificial Neuron Model
Artificial Neuron Model
Bias
Artificial Neuron Model
Bias
Activation functions
Activation functions
Perceptron

 Input values → Linear weighted sum → Threshold


Decision surface of a perceptron

 Representational power of perceptrons


- Linearly separable case like (a) :
possible to classify by hyperplane,
- Linearly inseparable case like (b) :
impossible to classify
Perceptron training rule (delta rule)

wi  wi + wi
where wi =  (t – o) xi
Where:
 t = c(x) is target value
 o is perceptron output
  is small constant (e.g., 0.1) called learning
rate
Can prove it will converge
 If training data is linearly separable
Gradient descent
Derivation of gradient descent

 Gradient descent
- Error (for all training examples.):

- the gradient of E ( partial differentiating ) :

- direction : steepest increase in E.


- Thus, training rule is as follows.

(The negative sign : the direction that decreases E)


Derivation of gradient
Derivation descent
of gradient descent

where xid denotes the single input


components xi for training example d

- The weight update rule for gradient descent



Gradient descent and delta rule
Derivation of gradient descent

Because the error surface


contains only a single global
minimum, this algorithm will
converge to a weight vector with
minimum error, given a
sufficiently small  is used
Hypothesis Derivation
Space of gradient descent

- Error of different hypotheses


- For a linear unit with two weights, the hypothesis space H is the wo,w1 plane.
- This error surface must be parabolic with a single global minimum (we
desire a hypothesis with minimum error).
Stochastic approximation to gradient
Derivation of gradient descent
descent

- Stochastic gradient descent (i.e. incremental mode) can sometimes


avoid falling into local minima because it uses the various gradient of E
rather than overall gradient of E.
Summary Derivation of gradient descent

 Perceptron training rule guaranteed to succeed if


 training examples are linearly separable
 Sufficiently small learning rate η

 Linear unit training rule using gradient descent


 Converge asymptotically to min. error hypothesis
(Guaranteed to converge to hypothesis with minimum squared
error )
THE PERCEPTRON
Derivation of gradient descent
THE PERCEPTRON
Derivation
Computing of gradient descent
with McCulloch-Pitts Neurons
THE PERCEPTRON
Derivation of gradient descent
THE PERCEPTRON
Limitation Derivation of
of MP-neurons gradient descent
THE PERCEPTRON
Derivation of gradient descent
THE PERCEPTRON
Derivation of gradient descent
Perceptron Analysis
THE PERCEPTRON
Derivation of gradient descent
Perceptron Analysis
THE PERCEPTRON
Derivation of gradient descent
Perceptron Analysis
THE PERCEPTRON
Derivation
Perceptron Learning Rule of gradient descent
THE PERCEPTRON
Derivation
Perceptron of gradient
Learning Rule descent
THE PERCEPTRON
Derivation
Perceptron of gradient
Learning Rule descent
THE PERCEPTRON
Derivation
Perceptron of Rule
Learning gradient descent
THE PERCEPTRON
Derivation of gradient
Perceptron Learning Rule descent
THE PERCEPTRON
Derivation of gradient descent
Perceptron Learning Algorithm
THE PERCEPTRON
Derivation
Perceptron of gradient
Learning Algorithmdescent
THE PERCEPTRON
Derivation of gradient descent
Pocket Algorithm
THE PERCEPTRON
Derivation
Adaline of gradient descent
THE PERCEPTRON
Derivation of gradient descent
Adaline Analysis
THE PERCEPTRON
Derivation of gradient
Adaline Learning Principle descent
THE PERCEPTRON
Derivation
Adaline of gradient descent
Adaline Learning Principle
THE PERCEPTRON
Derivation of gradient
Adaline Adaline Learningdescent
Principle
THE PERCEPTRON
Derivation of gradient
Adaline Adaline descent
Learning Algorithm
THE PERCEPTRON
Derivation of gradient
Adaline Adaline descent
Learning Algorithm
THE PERCEPTRON
Derivation of gradient
Adaline Adaline descent
Learning Algorithm
Multilayer networks and the Back propagation
AlgorithmDerivation of gradient descent
 Speech recognition example of multilayer networks learned by the
backpropagation algorithm
 Highly nonlinear decision surfaces
Multilayer networks and the back propagation
Sigmoid Threshold Unit
The Back propagation algorithm
The Back propagation algorithm
Adding Momentum

 Often include weight momentum α

- nth iteration update depend on (n-1)th iteration


-  : constant between 0 and 1 (momentum)
 Roles of momentum term
 The effect of keeping the ball rolling through small local minima in the
error surface
 The effect of gradually increasing the step size of the search in regions
(greatly improves the speed of learning)
The Back propagation algorithm
Convergence and Local Minima

 Gradient descent to some local minimum


 Perhaps not global minimum...
 Add momentum
 Stochastic gradient descent
Artificial Neuron Model
Applications of ANNs

 ANNs have been widely used in various domains


for:
 Pattern recognition
 Function approximation
 Associative memory
Types of connectivity

output units
 Feedforward networks
 These compute a series of
transformations hidden units
 Typically, the first layer is the
input and the last layer is the
input units
output.
 Recurrent networks
 These have directed cycles in their
connection graph. They can have
complicated dynamics.
 More biologically realistic.
Artificial Neuron Network
Different Network Topologies
 Single layer feed-forward networks
 Input layer projecting into the output layer

Single layer
network

Input Output
layer layer
Different Network Topologies
 Multi-layer feed-forward networks
 One or more hidden layers. Input projects only from
previous layers onto a layer.

2-layer or
1-hidden layer
fully connected
network
Input Hidden Output
layer layer layer
Different Network Topologies
 Multi-layer feed-forward networks

Input Hidden Output


layer layers layer
Different Network Topologies
 Recurrent networks
 A network with feedback, where some of its inputs
are connected to some of its outputs (discrete time).

Recurrent
network

Input Output
layer layer
How to Decide on a Network Topology?
Algorithm for learning ANN
 Initialize the weights (w0, w1, …, wk)

 Adjust the weights in such a way that the output


of ANN is consistent with class labels of training
examples
E   Yi  f ( wi , X i )
2
 Error function:
i

 Find the weights wi’s that minimize the above error


function
 e.g., gradient descent, backpropagation algorithm
Optimizing concave/convex function

 Maximum of a concave function = minimum of a


convex function
Gradient ascent (concave) / Gradient descent (convex)

Gradient ascent rule


Decision surface of a perceptron

 Decision surface is a hyperplane


 Can capture linearly separable classes
 Non-linearly separable
 Use a network of them
Multi-layer Networks
 Linear units inappropriate
 No more expressive than a single layer
 „Introduce non-linearity
 Threshold not differentiable
 „Use sigmoid function
Backpropagation
 Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
 For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target
value
 Modifications are made in the “backwards” direction: from the output
layer, through each hidden layer down to the first hidden layer, hence
“backpropagation”
 Steps
 Initialize weights (to small random #s) and biases in the network

 Propagate the inputs forward (by applying activation function)

 Backpropagate the error (by updating weights and biases)

 Terminating condition (when error is very small, etc.)

81
How A Multi-Layer Neural Network Works?
 The inputs to the network correspond to the attributes measured for
each training tuple
 Inputs are fed simultaneously into the units making up the input layer
 They are then weighted and fed simultaneously to a hidden layer
 The number of hidden layers is arbitrary, although usually only one
 The weighted outputs of the last hidden layer are input to units making
up the output layer, which emits the network's prediction
 The network is feed-forward in that none of the weights cycles back to
an input unit or to an output unit of a previous layer
 From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can
closely approximate any function

83
Defining a Network Topology

 First decide the network topology: # of units in the input


layer, # of hidden layers (if > 1), # of units in each hidden
layer, and # of units in the output layer
 Normalizing the input values for each attribute measured
in the training tuples to [0.0—1.0]
 One input unit per domain value, each initialized to 0
 Output, if for classification and more than two classes, one
output unit per class is used
 Once a network has been trained and its accuracy is
unacceptable, repeat the training process with a different
network topology or a different set of initial weights
84
Backpropagation and Interpretability
 Efficiency of backpropagation: Each epoch (one interation through the
training set) takes O(|D| * w), with |D| tuples and w weights, but # of
epochs can be exponential to n, the number of inputs, in the worst case
 Rule extraction from networks: network pruning
 Simplify the network structure by removing weighted links that have the
least effect on the trained network
 Then perform link, unit, or activation value clustering
 The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
 Sensitivity analysis: assess the impact that a given input variable has on a
network output. The knowledge gained from this analysis can be
represented in rules

85
Neural Network as a Classifier
 Weakness
 Long training time
 Require a number of parameters typically best determined empirically,
e.g., the network topology or “structure.”
 Poor interpretability: Difficult to interpret the symbolic meaning behind
the learned weights and of “hidden units” in the network
 Strength
 High tolerance to noisy data
 Ability to classify untrained patterns
 Well-suited for continuous-valued inputs and outputs
 Successful on a wide array of real-world data
 Algorithms are inherently parallel
 Techniques have recently been developed for the extraction of rules
from trained neural networks

86
Artificial Neural Networks (ANN)
Input
nodes Black box
X1 X2 X3 Y
1 0 0 0 Output
1 0 1 1
X1 0.3 node
1 1 0 1
1 1 1 1 X2 0.3 
0 0 1 0
Y
0 1 0 0
0 1 1 1 X3 0.3 t=0.4
0 0 0 0

Y  I (0.3 X 1  0.3 X 2  0.3 X 3  0.4  0)


1 if z is true
where I ( z )  
0 otherwise
Characteristics Artificial Neural Networks
Artificial Neural Networks Characteristics are –

 It is a neurally implemented mathematical model.

 It contains a large number of interconnected processing elements


called neurons to do all the operations.

 Information stored in the neurons is basically the weighted


linkage of neurons.

 The input signals arrive at the processing elements through


connections and connecting weights.

 It has the ability to learn, recall, and generalize from the given
data by suitable assignment and adjustment of weights.

 The collective behaviour of the neurons describes its


computational power and no single neuron carriers specific
information.
Deep learning vs Machine Learning
Back Propagation Algorithm

The main features of Back propagation are the iterative, recursive and efficient
method through which it calculates the updated weight to improve the network
until it is not able to perform the task for which it is being trained.

Derivatives of the activation function to be known at network design time is


required to Back propagation.

Now, how error function is used in Back propagation and how Back propagation
works?

Let start with an example and do it mathematically to understand how


exactly updates the weight using Back propagation.
Back Propagation Algorithm
Back Propagation Algorithm

Now, we first calculate the values of H1 and H2 by a forward pass.

Forward Pass

To find the value of H1 we first multiply the input value from the weights
as
Back Propagation Algorithm
H1=x1×w1+x2×w2+b1

H1=0.05×0.15+0.10×0.20+0.35

H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as


Back Propagation Algorithm
We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1

H2=0.05×0.25+0.10×0.30+0.35

H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as


Back Propagation Algorithm
Now, we calculate the values of y1 and y2 in the same way as we
calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the
outcome of H1 and H2 from the weights as

y1=H1×w5+H2×w6+b2

y1=0.593269992×0.40+0.596884378×0.45+0.60

y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as


Back Propagation Algorithm

We will calculate the value of y2 in the same way as y1


y2=H1×w7+H2×w8+b2

y2=0.593269992×0.50+0.596884378×0.55+0.60

y2=1.2249214
Back Propagation Algorithm
To calculate the final result of H1, we performed the sigmoid
function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched
with our target values T1 and T2.

Now, we will find the total error, which is simply the difference between the
outputs from the target outputs. The total error is calculated as
Back Propagation Algorithm

So, the total error is

Now, we will back propagate this error to update the weights using a
backward pass.
Back Propagation Algorithm
Backward pass at the output layer
To update the weight, we calculate the error correspond to each weight with the
help of a total error. The error on weight w is calculated by differentiating total
error with respect to w.

We perform backward process so first consider the last weight w5 as

From equation two, it is clear that we cannot partially differentiate it with respect
to w5 because there is no any w5. We split equation one into multiple terms so
that we can easily differentiate it with respect to w5 as
Back Propagation Algorithm

Now, we calculate each term one by one to differentiate Etotal with respect to
w5 as
Back Propagation Algorithm

Putting the value of e-y in equation (5)


Back Propagation Algorithm
Back Propagation Algorithm

Backward pass at Hidden layer

Now, we will back propagate to our hidden layer and update the weight w1, w2,
w3, and w4 as we have done with w5, w6, w7, and w8 weights.

We will calculate the error at w1 as


Back Propagation Algorithm
From equation (2), it is clear that we cannot partially differentiate it with
respect to w1 because there is no any w1. We split equation (1) into
multiple terms so that we can easily differentiate it with respect to w1 as

Now, we calculate each term one by one to differentiate Etotal with


respect to w1 as

We again split this because there is no any H1final term in Etoatal as


Back Propagation Algorithm
Back Propagation Algorithm
Back Propagation Algorithm
Back Propagation Algorithm
Back Propagation Algorithm
Back Propagation Algorithm
Back Propagation Algorithm

Now, we will calculate the updated weight w1new with the help of the
following formula
Back Propagation Algorithm

We have updated all the weights. We found the error 0.298371109 on the network
when we fed forward the 0.05 and 0.1 inputs. In the first round of Back propagation,
the total error is down to 0.291027924. After repeating this process 10,000, the
total error is down to 0.0000351085.

At this point, the outputs neurons generate 0.159121960 and 0.984065734 i.e.,
nearby our target value when we feed forward the 0.05 and 0.1.
Convolutional Neural Network
Convolutional Neural Network
Convolution
Convolutional Neural Network
Convolution Properties
Convolutional Neural Network
ConvNet
ConvNet architectures for images:

 fully-connected structure does not scale to large images


 the explicit assumption that the inputs are images
 allows us to encode certain properties into the
architecture.
 These then make the forward function more efficient to
implement
 Vastly reduce the amount of parameters in the network.

3D volumes: neurons arranged in 3 dimensions: width,


height, depth.
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Back propagation with weight constraints
Convolutional Neural Network
What does replicating the feature detectors achieve?
Convolutional Neural Network
Pooling the outputs of replicated feature detectors
Convolutional Neural Network
Example Architecture for CIFAR-10
Convolutional Neural Network
Convolution Layer
Convolutional Neural Network
Convolution
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Convolutions: More detail
Convolutional Neural Network
Spatial arrangement
Convolutional Neural Network
Spatial arrangement
Convolutional Neural Network
Spatial arrangement
Convolutional Neural Network
Parameter Sharing
Convolutional Neural Network
Parameter Sharing
Convolutional Neural Network
Summary of Conv Layer
Convolutional Neural Network
Spatial Pooling
Convolutional Neural Network
3. Spatial Pooling
Convolutional Neural Network
Pooling Layer
Convolutional Neural Network
General pooling layer
Convolutional Neural Network
General pooling
Convolutional Neural Network
Getting rid of pooling
Convolutional Neural Network
Getting Rod of Pooling 2
Softmax function
(Normalized exponential function)

If we take an input of [1,2,3,4,1,2,3], the softmax of that is


[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175].

The softmax function highlights the largest values and


suppress other values.Comparing to “max” function,
softmax is differentiable.
Convolutional Neural Network
Fully-connected layer
Convolutional Neural Network
Converting FC layers to CONV layers
Convolutional Neural Network
ConvNet Architectures
Convolutional Neural Network
Convolutional Neural Network
Recent Departures
Convolutional Neural Network
Case Studies
Convolutional Neural Network
Convolutional Neural Network
LeNet
Convolutional Neural Network
LeNet
Convolutional Neural Network
From hand-written digits to 3-D objects
Convolutional Neural Network
The ILSVRC-2012 competition on ImageNet
Convolutional Neural Network
Convolutional Neural Network
A neural network for ImageNet
Convolutional Neural Network
A Common Architecture: AlexNet
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Tricks that significantly improve generalization
Convolutional Neural Network
The hardware required for Alex’s net
Convolutional Neural Network
Case Study: ZFNet
Convolutional Neural Network
Case Studies
Convolutional Neural Network
Case Study: VGGNet
Convolutional Neural Network
Case Study: GoogLeNet
Convolutional Neural Network
GoogLeNet vs State of the art
Convolutional Neural Network
Residual Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Plain Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Residual Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Convolutional Neural Network
Results
Convolutional Neural Network
Convolutional Neural Network

Practical matters

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy