0% found this document useful (0 votes)

15 views27 pages

4.back Propagation New

Uploaded by

Kavitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views27 pages

4.back Propagation New

Uploaded by

Kavitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 27

4.

BACKPROPAGATION ALGORITHM

4.1 Learning as gradient descent

A popular learning method capable of handling such large learning problems the backpropagation
algorithm. This numerical method was used by different research communities in different
contexts, was discovered and rediscovered, until in 1985 it found its way into connectionist AI
mainly through the work of the PDP group [382]. It has been one of the most studied and used
algorithms for neural networks learning ever since.

In this chapter we present a proof of the backpropagation algorithm based on a graphical approach
in which the algorithm reduces to a graph labelling problem. This method is not only more general
than the usual analytical derivations, which handle only the case of special network topologies,
but also much easier to follow. It also shows how the algorithm can be efficiently implemented in
computing systems in which only local information can be transported through the network.

7.1.1 Differentiable activation functions

The backpropagation algorithm looks for the minimum of the error function in weight space using
the method of gradient descent. The combination of weights which minimizes the error function is
considered to be a solution of the learning problem. Since this method requires computation of the
gradient of the error function at each iteration step, we must guarantee the continuity and
differentiability of the error function. Obviously we have to use a kind of activation function other
than the step function used in perceptrons, because the composite function produced by
interconnected perceptrons is discontinuous, and therefore the error function too. One of the more
popu lar activation functions for backpropagation networks is the sigmoid, a real function

The constant c can be selected arbitrarily and its reciprocal 1/c is called the temperature parameter
in stochastic neural networks. The shape of the sigmoid changes according to the value of c, as
can be seen in Figure 7.1. The graph shows the shape of the sigmoid for c = 1, c = 2 and c = 3.
Higher values of c bring the shape of the sigmoid closer to that of the step function and in the
limit c → ∞ the sigmoid converges to a step function at the origin. In order to simplify all
expressions derived in this chapter we set c = 1, but after going through this material the reader
should be able to generalize all the expressions for a variable c. In the following we call the
sigmoid s1(x) just s(x)
7.1.2 Regions in input space

The sigmoid’s output range contains all numbers strictly between 0 and 1. Both extreme values
can only be reached asymptotically. The computing units considered in this chapter evaluate the
sigmoid using the net amount of exci tation as its argument. Given weights w1,...,wn and a bias
−θ, a sigmoidal unit computes for the input x1,...,xn the output
A higher net amount of excitation brings the unit’s output nearer to 1. The continuum of output
values can be compared to a division of the input space in a continuum of classes. A higher value
of c makes the separation in input space sharper

7.1.3 Local minima of the error function

A price has to be paid for all the positive features of the sigmoid as activation function. The most
important problem is that, under some circumstances, local minima appear in the error function
which would not be there if the step function had been used. Figure 7.5 shows an example of a
local minimum with a higher error level than in other regions. The function was computed for a
single unit with two weights, constant threshold, and four input-output patterns in the training set.
There is a valley in the error function and if gradient descent is started there the algorithm will not
converge to the global minimum
In many cases local minima appear because the targets for the outputs of the computing units are
values other than 0 or 1. If a network for the computation of XOR is trained to produce 0.9 at the
inputs (0,1) and (1,0) then the surface of the error function develops some protuberances, where
local minima can arise. In the case of binary target values some local minima are also present, as
shown by Lisboa and Perantonis who analytically found all local minima of the XOR function
[277].

7.2 General feed-forward networks

In this section we show that backpropagation can easily be derived by linking the calculation of
the gradient to a graph labeling problem. This approach is not only elegant, but also more general
than the traditional derivations found in most textbooks. General network topologies are handled
right from the beginning, so that the proof of the algorithm is not reduced to the multilayered case.
Thus one can have it both ways, more general yet simpler [375].

7.2.1 The learning problem

Recall that in our general definition a feed-forward neural network is a com putational graph
whose nodes are computing units and whose directed edges transmit numerical information from
node to node. Each computing unit is ca pable of evaluating a single primitive function of its
input. In fact the network represents a chain of function compositions which transform an input to
an output vector (called a pattern). The network is a particular implementation of a composite
function from input to output space, which we call the network function. The learning problem
consists of finding the optimal combination
of weights so that the network function ϕ approximates a given function f as closely as possible.
However, we are not given the function f explicitly but only implicitly through some examples.
Consider a feed-forward network with n input and m output units. It can consist of any number of
hidden units and can exhibit any desired feed-forward connection pattern. We are also given a
training set {(x1,t1),...,(xp,tp)} consisting of p ordered pairs of n- and m-dimensional vectors,
which are called the input and output patterns. Let the primitive functions at each node of the
network be continuous and differentiable. The weights of the edges are real numbers selected at
random. When the input pattern xi from the training set is presented to this network, it produces
an output oi different in general from the target ti. What we want is to make oi and ti identical for
i = 1,...,p, by using a learning algorithm. More precisely, we want to minimize the error function
of the network, defined as

After minimizing this function for the training set, new unknown input pat terns are presented to
the network and we expect it to interpolate. The network must recognize whether a new input
vector is similar to learned patterns and produce a similar output. The backpropagation algorithm
is used to find a local minimum of the error function. The network is initialized with randomly
chosen weights. The gradient of the error function is computed and used to correct the initial
weights. Our task is to compute this gradient recursively

The first step of the minimization process consists of extending the net work, so that it computes
the error function automatically. Figure 7.6 shows how this is done. Every one of the j output
units of the network is connected to a node which evaluates the function 1 2 (oij −tij)2, where oij
and tij denote the j-th component of the output vector oi and of the target ti. The outputs of the
additional m nodes are collected at a node which adds them up and gives the sum Ei as its output.
The same network extension has to be built for each pattern ti. A computing unit collects all
quadratic errors and outputs their sum E1 + ··· + Ep. The output of this extended network is the
error function E. We now have a network capable of calculating the total error for a given training
set. The weights in the network are the only parameters that can be modified to make the
quadratic error E as low as possible. Because E is calculated by the extended network exclusively
through composition of the node functions, it is a continuous and differentiable function of the ℓ
weights w1,w2,...,wℓ in the network. We can thus minimize E by using an iterative process of
gradient descent, for which we need to calculate the gradient

where γ represents a learning constant, i.e., a proportionality parameter which defines the step
length of each iteration in the negative gradient direction. With this extension of the original
network the whole learning problem now reduces to the question of calculating the gradient of a
network function with respect to its weights. Once we have a method to compute this gradient, we

function, where ∇E = 0.
can adjust the network weights iteratively. In this way we expect to find a minimum of the error

7.2.2 Derivatives of network functions

Now forget everything about training sets and learning. Our objective is to find a method for
efficiently calculating the gradient of a one-dimensional network function according to the
weights of the network. Because the network is equivalent to a complex chain of function
compositions, we expect the chain rule of differential calculus to play a major role in finding the
gradient of the function. We take account of this fact by giving the nodes of the network a
composite structure. Each node now consists of a left and a right side, as shown in Figure 7.7. We
call this kind of representation a B-diagram (for backpropagation diagram). The right side
computes the primitive function associated with the node, whereas the left side computes the
derivative of this primitive function for the same input
Note that the integration function can be separated from the activation function by splitting each
node into two parts, as shown in Figure 7.8. The first node computes the sum of the incoming
inputs, the second one the activation function s. The derivative of s is s′ and the partial derivative
of the sum of n arguments with respect to any one of them is just 1. This separation simplifies our
discussion, as we only have to think of a single function which is being computed at each node
and not of two. The network is evaluated in two stages: in the first one, the feed-forward step,
information comes from the left and each unit evaluates its primitive function f in its right side as
well as the derivative f′ in its left side. Both results are stored in the unit, but only the result from
the right side is transmit ted to the units connected to the right. The second step, the
backpropagation step, consists in running the whole network backwards, whereby the stored
results are now used. There are three main cases which we have to consider.

First case: function composition The B-diagramof Figure 7.9 contains only two nodes. In the feed-
forwardstep, incoming information into a unit is used as the argument for the evaluation of the
node’s primitive function and its derivative. In this step the network computes the composition of
the functions f and g. Figure 7.10 shows the state of the network after the feed-forward step. The
correct result of the function composition has been produced at the output unit and each unit has
stored some information on its left side. In the backpropagation step the input from the right of the
network is the constant 1. Incoming information to a node is multiplied by the value stored in its
left side. The result of the multiplication is transmitted to the next unit to the left. We call the
result at each node the traversing value at this node. Figure 7.11 shows the final result of the
backpropagation step, which is f′(g(x))g′(x), i.e., the derivative of the function composition f(g(x))
implemented by this network. The backpropagation step provides an imple mentation of the chain
rule. Any sequence of function compositions can be evaluated in this way and its derivative can be
obtained in the backpropa gation step. We can think of the network as being used backwards with
the input 1, whereby at each node the product with the value stored in the left side is computed
Second case: function addition The next case to consider is the addition of two primitive
functions. Fig ure 7.12 shows a network for the computation of the addition of the functions f1
and f2 . The additional node has been included to handle the addition of the two functions. The
partial derivative of the addition function with respect to any one of the two inputs is 1. In the
feed-forward step the network com putes the result f1(x) + f2(x). In the backpropagation step the
constant 1 is fed from the left side into the network. All incoming edges to a unit fan out the
traversing value at this node and distribute it to the connected units to the left. Where two right-to-
left paths meet, the computed traversing values are added. Figure 7.13 shows the result f′ 1(x)+f′
2(x) of the backpropagation step, which is the derivative of the function addition f1 + f2 evaluated
at x. A simple proof by induction shows that the derivative of the addition of any number of
functions can be handled in the same way
Third case: weighted edges Weighted edges could be handled in the same manner as function
composi tions, but there is an easier way to deal with them. In the feed-forward step the incoming
information x is multiplied by the edge’s weight w. The result is wx. In the backpropagation step
the traversing value 1 is multiplied by the weight of the edge. The result is w, which is the
derivative of wx with respect to x. From this we conclude that weighted edges are used in exactly
the same way in both steps: they modulate the information transmitted in each direction by
multiplying it by the edges’ weight.

7.2.3 Steps of the backpropagation algorithm

We can now formulate the complete backpropagation algorithm and prove by induction that it
works in arbitrary feed-forward networks with differentiable activation functions at the nodes. We
assume that we are dealing with a network with a single input and a single output unit
Implicit in the above analysis is that all inputs to a node are added be fore the one-dimensional
activation function is computed. We can consider also activation functions f of several variables,
but in this case the left side of the unit stores all partial derivatives of f with respect to each
variable. Figure 7.16 shows an example for a function f of two variables x1 and x2, de livered
through two different edges. In the backpropagation step each stored partial derivative is
multiplied by the traversing value at the node and trans mitted to the left through its own edge. It
is easy to see that backpropagation still works in this more general case.

The backpropagation algorithm also works correctly for networks with more than one input unit in
which several independent variables are involved. In a network with two inputs for example,
where the independent variables x1 and x2 are fed into the network, the network result can be
called F(x1,x2). The network function now has two arguments and we can compute the par tial
derivative of F with respect to x1 or x2. The feed-forward step remains unchanged and all left side
slots of the units are filled as usual. However, in the backpropagation step we can identify two
subnetworks: one consists of all paths connecting the first input unit to the output unit and another
of all paths from the second input unit to the output unit. By applying the back propagation step in
the first subnetwork we get the partial derivative of F with respect to x1 at the first input unit. The
backpropagation step on the second subnetwork yields the partial derivative of F with respect to
x2 at the second input unit. Note that we can overlap both computations and perform a single
backpropagation step over the whole network. We still get the same results.

7.2.4 Learning with backpropagation

We consider again the learning problem for neural networks. Since we want to minimize the error
function E, which depends on the network weights, we have to deal with all weights in the
network one at a time. The feed-forward step is computed in the usual way, but now we also store
the output of each unit in its right side. We perform the backpropagation step in the extended
network that computes the error function and we then fix our attention on one of the weights, say
wij whose associated edge points from the i-th to the

j-th node in the network. This weight can be treated as an input channel into the subnetwork made
of all paths starting at wij and ending in the single output unit of the network. The information fed
into the subnetwork in the feed-forward step was oiwij, where oi is the stored output of unit i. The
backpropagation step computes the gradient of E with respect to this input, i.e., ∂E/∂oiwij. Since
in the backpropagation step oi is treated as a constant, we finally have

Summarizing, the backpropagation step is performed in the usual way. All subnetworks defined
by each weight of the network can be handled simulta neously, but we now store additionally at
each node i:
• The output oi of the node in the feed-forward step.
• The cumulative result of the backward computation in the backpropaga tion step up to this node.
We call this quantity the backpropagated error.
If we denote the backpropagated error at the j-th node by δj, we can then express the partial
derivative of E with respect to wij as:

7.3 The case of layered networks An important special case of feed-forward networks is that of
layered networks with one or more hidden layers. In this section we give explicit formulas for the
weight updates and show how they can be calculated using linear algebraic operations. We also
show how to label each node with the backpropagated error in order to avoid redundant
computations

7.3.1 Extended network We will consider a network with n input sites, k hidden, and m output
units. The weight between input site i and hidden unit j will be called w(1) ij . The weight between
hidden unit i and output unit j will be called w(2) ij . The bias −θ of each unit is implemented as
the weight of an additional edge. Input vectors are thus extended with a 1 component, and the
same is done with the output vector from the hidden layer. Figure 7.17 shows how this is done.
The weight between the constant 1 and the hidden unit j is called w(1) n+1,j and the weight
between the constant 1 and the output unit j is denoted by w(2) k+1,j.

There are (n + 1) × k weights between input sites and hidden units and (k +1)×m between hidden
and output units. Let W1 denote the (n+1)×k matrix with component w(1) ij at the i-th row and
the j-th column. Similarly let W2 denote the (k + 1) × m matrix with components w(2) ij . We use
an overlined notation to emphasize that the last row of both matrices corresponds to the biases of
the computing units. The matrix of weights without this last row will be needed in the
backpropagation step. The n-dimensional input vector o = (o1,...,on) is extended, transforming it
to ˆo = (o1,...,on,1). The excitation netj of the j-th hidden unit is given by

The activation function is a sigmoid and the output o(1) j of this unit is thus
7.3.2 Steps of the algorithm
Figure 7.18 shows the extended network for computation of the error function. In order to
simplify the discussion we deal with a single input-output pair (o,t) and generalize later to p
training examples. The network has been extended with an additional layer of units. The right
sides compute the quadratic de viation 1 2 (o(2) i −ti) for the i-th component of the output vector
and the left sides store (o(2) i −ti). Each output unit i in the original network computes the sigmoid
s and produces the output o(2) i . Addition of the quadratic devi ations gives the error E. The error
function for p input-output examples can be computed by creating p networks like the one shown,
one for each training pair, and adding the outputs of all of them to produce the total error of the
training set. After choosing the weights of the network randomly, the backpropagation algorithm
is used to compute the necessary corrections.
The algorithm can be decomposed in the following four steps:
i) Feed-forward computation
ii) Backpropagation to the output layer
iii) Backpropagation to the hidden layer
iv) Weight updates The algorithm is stopped when the value of the error function has
become sufficiently small
First step: feed-forward computation E The vector o is presented to the network. The vectors o(1)
and o(2) are com puted and stored. The evaluated derivatives of the activation functions are also
stored at each unit. Second step: backpropagation to the output layer We are looking for the first
set of partial derivatives ∂E/∂w(2) ij . The back propagation path from the output of the network
up to the output unit j is shown in the B-diagram of Figure 7.19
Third step: backpropagation to the hidden layer Now we want to compute the partial derivatives
∂E/∂w(1) i and at ij . Each unit j in the hidden layer is connected to each unit q in the output layer
with an edge of weight w(2) jq , for q = 1,...,m. The backpropagated error up to unit j in the
hidden layer must be computed taking into account all possible backward paths, as shown in
Figure 7.21. The backpropagated error is then
7.3.5 Error during training 2 We discussed the form of the error function for the XOR problem in
the last chapter. It is interesting to see how backpropagation performs when con fronted with this
problem. Figure 7.23 shows the evolution of the total error during training of a network of three
computing units. After 600 iterations the algorithm found a solution to the learning problem. In
the figure the error falls fast at the beginning and end of training. Between these two zones lies a
region in which the error function seems to be almost flat and where progress is slow. This
corresponds to a region which would be totally flat if step func tions were used as activation
functions of the units. Now, using the sigmoid, this region presents a small slope in the direction
of the global minimum. In the next chapter we discuss how to make backpropagation converge
faster, taking into account the behavior of the algorithm at the flat spots of the error function

7.4 Recurrent networks The backpropagation algorithm can also be extended to the case of
recurrent networks. To deal with this kind of systems we introduce a discrete time variable t. At
time t all units in the network recompute their outputs, which are then transmitted at time t+1.
Continuing in this step-by-step fashion, the system produces a sequence of output values when a
constant or time varying input is fed into the network. As we already saw in Chap. 2, a recurrent
network behaves like a finite automaton. The question now is how to train such an automaton to
produce a desired sequence of output values.

7.4.1 Backpropagation through time 600 The simplest way to deal with a recurrent network is to
consider a finite num ber of iterations only. Assume for generality that a network of n computing
units is fully connected and that wij is the weight associated with the edge from node i to node j.
By unfolding the network at the time steps 1,2,...,T, we can think of this recurrent network as a
feed-forward network with T stages of computation. At each time step t an external input x(t) is
fed into the net work and the outputs (o(t) 1 ,...,o(t) n ) of all computing units are recorded. We
call the n-dimensional vector of the units’ outputs at time t the network state o(t). We assume that
the initial values of all unit’s outputs are zero at t = 0, but the external input x(0) can be different
from zero. Figure 7.24 shows a diagram of the unfolded network. This unfolding strategy which
converts a recurrent network into a feed-forward network in order to apply the back propagation
algorithm is called backpropagation through time or just BPTT [383]. Let W stand for the n×n
matrix of network weights wij. Let W0 stand for the m×n matrix of interconnections between m
input sites and n units. The feed-forward step is computed in the usual manner, starting with an
initial m-dimensional external input x(0). At each time step t the network state o(t) (an n-
dimensional row vector) and the vector of derivatives of the activation function at each node o′(t)
are stored. The error of the network can be measured after each time step if a sequence of values
is to be produced, or just after the final step T if only the final output is of importance. We will
handle the first, more general case. Denote the difference between the n-dimensional target y(t) at
time t and the output of the network by e(t) = o(t) − y(t)T. This is an n-dimensional column
vector, but in most cases we are only interested in the outputs of some units in the network. In that
case
Things become complicated when we consider that each weight in the network is present at each
stage of the unfolded network. Until now we had only handled the case of unique weights.
However, any network with repeated weights can easily be transformed into a network with
unique weights. Assume that after the feed-forward step the state of the network is the one shown
in Figure 7.25. Weight w is duplicated, but received different inputs o1 and o2 in the feed-forward
step at the two different locations in the network. The transformed network in Figure 7.26 is
indistinguishable from the original network from the viewpoint of the results it produces. Note
that the two edges associated with weight w now have weight 1 and a multiplication is performed
by the two additional units in the middle of the edges. In this transformed network w appears only
once and we can perform backpropagation as usual. There are two groups of paths, the ones
coming from the first multiplier to w
7.4.2 Hidden Markov Models Hidden Markov Models (HMM) form an important special type of
recurrent network. A first-order Markov model is any system capable of assuming one of n
different states at time t. The system does not change its state at each time step deterministically
but according to a stochastic dynamics. The probability of transition from the i-th to the j-th state
at each step is given by 0 ≤ aij ≤ 1 and does not depend on the previous history of transitions.
These probabilities can be arranged in an n × n matrix A. We also assume that at each step the
model emits one of m possible output values. We call the probability of emitting the k-th output
value while in the i-th state bik. Starting from a definite state at time t = 0, the system is allowed
to run for T time units and the generated outputs are recorded. Each new run of the system
generally produces a different sequence of output values. The system is called a HMM because
only the emitted values, not the state transitions, can be observed. An example may make this
point clear. In speech recognition researchers postulate that the vocal tract shapes can be
quantized in a discrete set of states roughly associated with the phonemes which compose speech.
When speech is recorded the exact transitions in the vocal tract cannot be observed and only the
produced sound can be measured at some predefined time intervals. These are the emissions, and
the states of the system are the quantized configurations of the vocal tract. From the
measurements we want to infer the sequence of states of the vocal tract, i.e., the sequence of
utterances which gave rise to the recorded sounds. In order to make this problem manageable, the
set of states and the set of possible sound parameters are quantized (see Chap. 9 for a deeper
discussion of automatic speech recognition). The general problem when confronted with the
recorded sequence of out put values of a HMM is to compute the most probable sequence of state

7.4.3 Variational problems Our next example, deals not with a recurrent network, but with a class
of networks built of many repeated stages. Variational problems can also be ex pressed and solved
numerically using backpropagation networks. A variational problem is one in which we are
looking for a function which can optimize a certain cost function. Usually cost is expressed
analytically in terms of the unknown function and finding a solution is in many cases an
extremely dif f icult problem. An example can illustrate the general technique that can be used.
Assume that the problem is to minimize P with two boundary conditions:

Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Cambridge Primary Mathematics Learner S Book 6 2nd Edition Cambridge Primary Maths Mary Wood PDF Download
100% (1)
Cambridge Primary Mathematics Learner S Book 6 2nd Edition Cambridge Primary Maths Mary Wood PDF Download
48 pages
The Influence of The Sigmoid Function Parameters On The Speed of Backpropagation Learning
No ratings yet
The Influence of The Sigmoid Function Parameters On The Speed of Backpropagation Learning
7 pages
Class 3 Maths Olympiad - 2nd PDF
83% (6)
Class 3 Maths Olympiad - 2nd PDF
8 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
Multilayer Networks and The Backpropagation Algorithm
No ratings yet
Multilayer Networks and The Backpropagation Algorithm
4 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Single Neuron As A Classifier
No ratings yet
Single Neuron As A Classifier
27 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Backpropagation - Wikipedia
No ratings yet
Backpropagation - Wikipedia
28 pages
BACK PROPAGATION Cluster 4
No ratings yet
BACK PROPAGATION Cluster 4
45 pages
ANN Presentation
No ratings yet
ANN Presentation
29 pages
Neural Networks and Neural Language Models
No ratings yet
Neural Networks and Neural Language Models
27 pages
Feedforward Networks: Marco Kuhlmann
No ratings yet
Feedforward Networks: Marco Kuhlmann
53 pages
Neural Networks
No ratings yet
Neural Networks
11 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
Gradient Flow in Recurrent Nets-The Difficulty of Learning Long-Term
No ratings yet
Gradient Flow in Recurrent Nets-The Difficulty of Learning Long-Term
15 pages
Chap 12
No ratings yet
Chap 12
120 pages
Tuto 6 Optimisation ENSIA
No ratings yet
Tuto 6 Optimisation ENSIA
3 pages
Lecture Slides
No ratings yet
Lecture Slides
30 pages
Neural Networks & Deep Learning 2025
No ratings yet
Neural Networks & Deep Learning 2025
73 pages
Unit 4
No ratings yet
Unit 4
28 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Learning Curves For Stochastic Gradient Descent in Linear Feedforward Networks
No ratings yet
Learning Curves For Stochastic Gradient Descent in Linear Feedforward Networks
8 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
L8 Ann
No ratings yet
L8 Ann
20 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Chapter 6 - Feedforward Deep Networks
No ratings yet
Chapter 6 - Feedforward Deep Networks
27 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
Unit 2
No ratings yet
Unit 2
37 pages
Unit 2
No ratings yet
Unit 2
36 pages
l6 - Generalized Delta Ruled
No ratings yet
l6 - Generalized Delta Ruled
16 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
8 pages
FFNN, GD, Backpropagation
No ratings yet
FFNN, GD, Backpropagation
18 pages
Earths Magnetic Personality
No ratings yet
Earths Magnetic Personality
51 pages
Experiments On Learning by Back Propagation
No ratings yet
Experiments On Learning by Back Propagation
45 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
9 pages
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
No ratings yet
3.1.1weight Decay, Weight Elimination, and Unit Elimination: GX X X X, Which Is Plotted in
26 pages
3rd Quarter Gr.9
No ratings yet
3rd Quarter Gr.9
52 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
Unit 3
No ratings yet
Unit 3
17 pages
NN 2
No ratings yet
NN 2
12 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Multilayer Perceptrons Neural Networks
No ratings yet
Multilayer Perceptrons Neural Networks
19 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
No ratings yet
Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
9 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
2023 Practice Paper 3 Foundation (Calculator)
No ratings yet
2023 Practice Paper 3 Foundation (Calculator)
18 pages
Radial Basis Function Networks: The Structure of The RBF Networks
No ratings yet
Radial Basis Function Networks: The Structure of The RBF Networks
8 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Future Scope and Conclusion
No ratings yet
Future Scope and Conclusion
13 pages
Feed Forward Neural Network Assignment PDF
No ratings yet
Feed Forward Neural Network Assignment PDF
11 pages
Greek Numbers
100% (1)
Greek Numbers
6 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Math 5 QTR 2 Week 1
No ratings yet
Math 5 QTR 2 Week 1
8 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
DL Exam 2023-2
No ratings yet
DL Exam 2023-2
5 pages
YT +Quadratic+equations+Top+DPPs +11th+elite
No ratings yet
YT +Quadratic+equations+Top+DPPs +11th+elite
64 pages
Lesson04 PDF
No ratings yet
Lesson04 PDF
51 pages
Cis Extended Booklist
100% (1)
Cis Extended Booklist
26 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
18 pages
Compiled By: Bharath Annamaneni & Hari Vardhan Yerramsetty For
No ratings yet
Compiled By: Bharath Annamaneni & Hari Vardhan Yerramsetty For
60 pages
Project Num Boost
No ratings yet
Project Num Boost
5 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Lecture Pt. 3
No ratings yet
Lecture Pt. 3
39 pages
Area Under Curve
No ratings yet
Area Under Curve
4 pages
JEE Main 2025 Math Checklist
No ratings yet
JEE Main 2025 Math Checklist
2 pages
Trends in Power and Energy in Integrated Circuits
No ratings yet
Trends in Power and Energy in Integrated Circuits
21 pages
Memory Technology
No ratings yet
Memory Technology
26 pages
CLASS 10th MATHS (Standar) HALF YEARLY 2024-25
No ratings yet
CLASS 10th MATHS (Standar) HALF YEARLY 2024-25
6 pages
Secondary 1 G3 Math - Approximation and Estimation
No ratings yet
Secondary 1 G3 Math - Approximation and Estimation
21 pages
Depende Bali Ty
No ratings yet
Depende Bali Ty
9 pages
Test 34 + Answer Key
No ratings yet
Test 34 + Answer Key
7 pages
Maths 1 Question Paper Nov Dec 2010
No ratings yet
Maths 1 Question Paper Nov Dec 2010
4 pages
XI STD Volume - I Book Back One Word Practice Question Paper
No ratings yet
XI STD Volume - I Book Back One Word Practice Question Paper
12 pages
HMM
No ratings yet
HMM
25 pages
Daily Test 1st of Global Math Grade 5 (AutoRecovered)
No ratings yet
Daily Test 1st of Global Math Grade 5 (AutoRecovered)
3 pages
The Analysis of Runge Phenomenon
No ratings yet
The Analysis of Runge Phenomenon
29 pages
Math 208 Questions With 100
No ratings yet
Math 208 Questions With 100
3 pages
Sp14 Gurukul School
No ratings yet
Sp14 Gurukul School
8 pages
On The Average Number of Maxima in A Set of Vectors and Applications (Jon L Bentley) (1978)
No ratings yet
On The Average Number of Maxima in A Set of Vectors and Applications (Jon L Bentley) (1978)
8 pages
1.1 Ejercicios Progresiones NS Sols
No ratings yet
1.1 Ejercicios Progresiones NS Sols
2 pages
CheatSheet BBE
No ratings yet
CheatSheet BBE
2 pages
Birla Institute of Technology and Science, Pilani Pilani Campus
No ratings yet
Birla Institute of Technology and Science, Pilani Pilani Campus
3 pages
'Ishnath Pathak and IIT' (Part 01 of 02)
No ratings yet
'Ishnath Pathak and IIT' (Part 01 of 02)
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

4.back Propagation New

Uploaded by

4.back Propagation New

Uploaded by

4.

4.1 Learning as gradient descent

7.1.1 Differentiable activation functions

7.1.3 Local minima of the error function

7.2 General feed-forward networks

7.2.1 The learning problem

7.2.2 Derivatives of network functions

7.2.3 Steps of the backpropagation algorithm

7.2.4 Learning with backpropagation

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.