0% found this document useful (0 votes)
5 views28 pages

Back Propagation

Uploaded by

gopineedivignesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

Back Propagation

Uploaded by

gopineedivignesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Artificial Neural Networks

Slides 2 to 10 Ref Tom Mitchell


Slides 15 to 21 Ref Tom Mitchell
Slides 11 to 14, 22 to end Ref Jia Wei

1
Ref book. Machine Learning by Tom Mitchell
2
Perceptron Training Rule

One way to learn an acceptable weight vector is to begin with random weights, then
iteratively apply the perceptron to each training example, modifying the perceptron weights
whenever it misclassifies an example. This process is repeated, iterating through the training
examples as many times as needed until the perceptron classifies all training examples
correctly. Weights are modified at each step according to the perceptron training rule, which
revises the weight wi associated with input xi according to the rule

Here t is the target output for the current training example, o is the output generated by
the perceptron, and η is a positive constant called the learning rate.

3
For example, if xi = .8, η = 0.1, t = 1, and o = - 1,
then the weight update will be Δwi = η(t - o)xi = 0.1(1 - (-1))0.8 = 0.16. Then the weight is
increased.

On the other hand, if t = -1 and o = 1, then weights associated with positive xi will
be decreased rather than increased.

4
where D is the set of training examples, td is the target output for training example d,
and od is the output of the linear unit for training example d. By this definition, E(w) is
simply half the squared difference between the target output td and the unit output od,
summed over all training examples.

Here we characterize E as a function of w, because the linear unit output o depends on


this weight vector.

5
Error Surface

Gradient descent search determines a weight vector that


minimizes E by starting with an arbitrary initial weight
vector, then repeatedly modifying it in small steps. At
each step, the weight vector is altered in the direction
that produces the steepest descent along the error
surface depicted in Figure. This process continues until
the global minimum error is reached.

6
DERIVATION OF THE GRADIENT DESCENT RULE

How can we calculate the direction of steepest descent along the error surface?

This direction can be found by computing the derivative of E with respect to each
component of the vector w. This vector derivative is called the gradient of E with
respect to w, written

7
8
9
10
11
12
BACK PROPAGATION
“How does backpropagation work?”

Backpropagation learns by iteratively processing a data set of training tuples, comparing the
network’s prediction for each tuple with the actual known target value.

The target value may be the known class label of the training tuple (for classification problems)
or a continuous value (for numeric prediction).

For each training tuple, the weights are modified so as to minimize the mean-squared error
between the network’s prediction and the actual target value.

These modifications are made in the “backwards” direction (i.e., from the output layer)
through each hidden layer down to the first hidden layer (hence the name backpropagation).

13
Gradient Descent (GD) is a widely used optimization algorithm in machine learning and deep
learning that minimises the cost function of a neural network model during training. It works by
iteratively adjusting the weights or parameters of the model in the direction of the negative
gradient of the cost function until the minimum of the cost function is reached.

The cost function evaluates the difference between the actual and predicted outputs.

The algorithm calculates gradients, representing the partial derivatives of the cost function
concerning each parameter.

Multilayer feed-forward networks, given enough hidden units and enough training samples, can
closely approximate any function

14
15
16
17
18
19
20
21
22
23
For a unit j in the output layer,

To compute the error of a hidden layer unit j, the weighted sum of the errors of
the units connected to unit j in the next layer are considered. The error of a
hidden layer unit j is

24
Multi Layer Feed Forward Network Problem

Learning rate=0.9; X=[1, 0, 1]; Target class label=1 25


26
27
28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy