0% found this document useful (0 votes)

15 views12 pages

NN 2

The document discusses the concepts of forward and backpropagation in neural networks, highlighting the differences between perceptrons and neurons, as well as the importance of activation functions like Sigmoid. It explains how forward propagation computes the output of a neural network using both non-vectorized and vectorized methods, and how backpropagation allows networks to learn by minimizing error through gradient descent. The document emphasizes the mathematical rigor behind these processes and the significance of efficient computation in training neural networks.

Uploaded by

myroslavrepin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

NN 2

Uploaded by

myroslavrepin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Neural Networks: Forward and Backpropagation

Nikhil Sardana
October 2017

1 Introduction
We have covered Perceptrons, the fundamental unit of the Neural Network (See
”Neural Networks: Introduction and Overview” for the previous lecture). Ad-
ditionally, Multi-Layer Perceptrons, or Neural Networks, were introduced as a
solution for approximating non-linearly separable data. This lecture continues
exploring Neural Networks, but introduces vectorization for more efficient no-
tation and computation. We also cover how neural networks learn, which is
significantly more complex than the perceptron learning algorithm.

2 The Neuron
A single node of a neural network (a neuron) differs from a perceptron in one
way: the activation function. Consider this diagram of a neuron:

1
The symbol σ represents the Sigmoid activation function σ(x) = 1+e−x .

1
1.0 y
σ(x) = 1+e1−x
g(x) = 1+e1−5x 0.8
1
h(x) = 1+e−100x
0.6

0.4

0.2
x
−5.0 −4.0 −3.0 −2.0 −1.0 1.0 2.0 3.0 4.0 5.0
Notice how as the coefficent of x approaches infinity, σ(x) approaches the
step function from before. We use σ(x) is because it is differenciable, which is
necessary for networks to learn. Other activation functions include tanh(x) and
ReLU, but we will use Sigmoid for our examples.
The rest of a neuron is identical to a perceptron: multipy each input by its
weight, add them up and the bias and compute the activation function of the
sum.

3 Forward Propagation
3.1 Non-Vectorized Forward Propagation
Forward Propagation is a fancy term for computing the output of a neural
network. We must compute all the values of the neurons in the second layer
before we begin the third, but we can compute the individual neurons in any
given layer in any order. Consider the following network:

2
We denote the value of node i as ni , and the bias of node i as bi . Computing
the network using these variables, we get:

n3 = σ(w13 n1 + w23 n2 + b3 )

n4 = σ(w14 n1 + w24 n2 + b4 )
n5 = σ(w15 n1 + w25 n2 + b5 )
n6 = σ(w36 n3 + w46 n4 + w56 n5 + b6 )
Continuing this example of forward propagation, let’s assign some numbers
and compute the output of this network. Let n1 = 0.2 and n2 = 0.3. Let
w13 = 4, w14 = 5, w15 = 6, w23 = 5, w24 = 6, w25 = 7, w36 = 9, w46 = 10 and
w56 = 11, just so they are easy to remember. Let all the biases b3..6 = 1 (input
nodes do not have biases, the ”input nodes” are simply values given to the
network). In practice, weights and biases of a network are initialized randomly
between −1 and 1. Given these numbers, we compute:

n3 = σ(4 ∗ 0.2 + 5 ∗ 0.3 + 1) = σ(3, 3) = 0.964

n4 = σ(5 ∗ 0.2 + 6 ∗ 0.3 + 1) = σ(3.8) = 0.978

n5 = σ(6 ∗ 0.2 + 7 ∗ 0.3 + 1) = σ(4.3) = 0.987
n6 = σ(9 ∗ 0.964 + 10 ∗ 0.978 + 11 ∗ 0.987 + 1) = σ(30.313) = 1
This example actually illustrates one of the weak points of the Sigmoid
function: it quickly approaches 1 for large numbers. The reason for using the
Sigmoid function will be shown in the section on backpropagation.

3.2 Vectorized Forward Propagation

Look again at these nodes of the network:

n3 = σ(w13 n1 + w23 n2 + b3 )

n4 = σ(w14 n1 + w24 n2 + b4 )
n5 = σ(w15 n1 + w25 n2 + b5 )
We can rewrite this as
     !
n3 w13 w23 b3
n
n4  = σ w14 w24  1 + b4 
n2
n5 w15 w25 b5

Notice how the nodes in each layer of the network are in their own column
vector, in the order they appear. Let’s relabel this network by layers:

3
x0 x1 x2

w1 w2

Here, x0 and x2 represent the input and output layers, and x1 is the middle
layer (called a hidden layer). Mathematically speaking, these are represented
as column vectors of dimension n × 1, where n is the number of nodes in the
layer. Thinking back to the non-vectorized network in section 3.1,
 
n3
n n
x0 = 1 x1 = n4  x2 = 6
n2 n7
n5

w1 and w2 are the weight matrices. Thinking back to the non-vectorized network
in section 3.1, w1 corresponds to
 
w13 w23
w14 w24 
w15 w25

and w2 refers to  
w36
w46 
w56
Each layer (except the input) also has a bias vector, which has the same di-
mension as the layer itself (each node has a bias). Again thinking back to the
non-vectorized network in section 3.1, we define b1 to be
 
b3
b4 
b5

and b2 to be
b6
b7
We can now re-write the forward propagation formula in a far more compact
form. In any n layer network, for a given layer xi+1 (assuming 0 ≤ i < n − 1):

xi+1 = σ(wi xi + bi+1 )

4
4 Backpropagation
Backpropagation is how neural networks learn. It is essential to not only un-
derstand the theory behind backpropagation, but also the mathematics behind
it. This is one of the few mathetmatically rigorous sections of our material. (Of
course, for anyone who has taken Multivariable calculus, the material should
be relatively straightforward. Nevertheless, high school students not entirely
comfortable with the math will no doubt have some trouble.)

4.1 Learning
A neural network learns when it is given training data and labels. The data
(inputs) can be in the form of text, images, numbers, etc. The label is the ground
truth, the correct answer for the given input. Given enough data-label pairs,
a network can learn to generalize the relationship between the data and label.
After training, it is tested or validated on a set of data it has never seen before
(i.e. data not part of the training set). This validation accuracy shows just how
well a network has learned to generalize through training. Backpropagation is
the method of updating the weights and biases of the network to minimize the
error when training.

4.2 Error
Consider the following network:
x0 x1 x2

W1 W2

For the input x0 , let y represent the target vector, or the ground truth. We
define the error as
1
E = ||x2 − y||2
2
Essentially, this is the magnitude of the difference between the target and
the network’s output. In order for a network to become more accurate, we want
to minimize this error.
Let’s think of E as a function. Only x2 can vary, and we can only control
this by changing the weight matrices (and the bias). Thus, for a neuron with n
weights and a bias, the error can be graphed as an n + 2 dimensional function

5
(y = f (x) has 1 input, so it is graphed in two dimensions). For this network,
each of the weights (3 ∗ 4 + 4 ∗ 2 = 20) and the biases (6) determines the error,
so the error has many, many dimensions. If we get to the minimum of this
function, we have minimized the error and trained the network.

4.3 Gradient Descent

We can’t visualize that many dimensions (at least, I can’t), so lets pretend we
are working with a three dimensional function. How to we get to the minimum?
We use gradient descent, of course!

A multi-dimensional function. Look at that minimum!

Gradient descent is simple: Starting at some point, we move in the direction

of steepest decline for a certain length. Then, at our new point, we again
compute the direction of steepest decline, and move in that direction for a
certain length. We repeat this process over and over until every single direction
is an incline, at which point we are at the minimum.
This has three issues. First, how do we know how long our steps are? Take a
step too long, and we could overshoot the minimum. Take a step too short and
it will take us many steps to reach the minimum. The step length is actually
just a constant set by the programmer, and normally ranges from 0.1 to 0.0001.
Adjusting the constant to get the best result is an important practical topic for
getting the best result, and we will discuss it in Part 3 of the lecture. For now,
just know its a constant.
Secondly, doesn’t gradient descent just get us to a minimum? What if there
are multiple minima, and we just happen to land in a local minimum, like the
many in the function below?

6
Getting out of local minima to reach the global minimum is another impor-
tant machine learning topic. Different optimizers can help the network pop out
of local minima using momentum, but this topic is complex and modern, so it
is covered in depth in Part 3 of this lecture. For the purposes of explaining
gradient descent, we’ll just pretend we’re working with an error function with
one minimum.
The third and final issue is: how do we know which direction is the steepest?
We can’t just sample each direction, as there are infinite possibilities. Instead,
we mathematically compute the best direction. Let’s consider a simple two-
dimensional parabola:

From elementary calculus, we know that:

f (x) = x2
f 0 (x) = 2x
The derivative gives us the instantaneous rate of change for any x. If we
have a function in terms of x and y, we can take the derivative of f (x, y) with
respect to x to find the rate of change in the x direction, and the derivative with
respect to y to find the rate of change in the y direction. These are called partial
derivatives. We treat the other variables like we would any other constant.

7
Let’s do an example. Given f (x, y) = 2x2 + 3xy + y 3 , the partial derivatives
are:
∂f
= 4x + 3y
∂x
∂f
= 3x + 3y 2
∂x
The gradient of f (x, y), or ∇f (x, y) is just the vector:
∂f ∂f
,
∂x ∂y
For our example, the gradient is:

(4x + 3y, 3x + 3y 2 )

This is the direction of steepest ascent. How do we know that? First, lets
consider the directional derivative. ∇u f (x0 , y0 ) is the rate of change of f (x, y)
at the point (x0 , y0 ) in the direction ~u. It is also defined in terms of the gradient
as:
∇u f (x0 , y0 ) = ∇f (x, y) · ~u
We know from our standard dot product rule:

~a · ~b = ||~a||||~b|| cos(θ)

And cos(θ) is maximized at θ = 0. Thus, when two vectors are in the

same direction, their dot product is maximized. From this information, the
maximum of the directional derivative must be when ∇f (x, y) and ~u are in the
same direction. This means that the direction of the steepest ascent (maximum
rate of change) is the direction of the gradient.
Great! Now our third issue has been solved. In order to find the minimum
of a multi-dimensional function, we just need to compute the gradient, move in
that direction for a certain length, and repeat until the gradient is 0. The only
problem is.... how do we compute the gradient? Our function is
1
E(W, b) = ||o − t||2
2
Where o is the network output at t is the target. Since the error is in terms
of the weights and biases, that means that we need to compute:
∂E ∂E ∂E
, , ...,
∂W1 ∂W2 ∂bn
This is why backpropagation is a fundamental concept in machine learning.
It allows us to compute this gradient in a computationally efficient manner.

4.4 Non-Vectorized Backpropagation

Don’t do it. Seriously.

8
4.5 Vectorized Backpropagation
Consider the network from Section 4.2 again.

x0 x1 x2

W1 W2

Ignoring biases (which we will see follow a relatively simple rule), we know
from forward propagation that:

x1 = σ(W1 x0 )

x2 = σ(W2 x1 )
And the error is, assuming some 2 × 1 target vector y:
1
E= ||x2 − y||2
2
Let’s first take the partial derivative of E with respect to W2 . This is just
like taking a normal derivative (using the chain rule).

∂E ∂(σ(W2 x1 ))
= (x2 − y)
∂W2 ∂W2
∂E ∂W2 x1
= [(x2 − y) σ 0 (W2 x1 )]
∂W2 ∂W2
Here, is the Hadamard product, or element wise multiplication (Remem-
ber, these are all vectors). For the sake of simplification, lets define

δ2 = (x2 − y) σ 0 (W2 x1 )

Then, we can rewrite the partial as

∂E ∂W2 x1
= δ2 = δ2 xT1
∂W2 ∂W2

Note that xT1 means that the x1 vector has been transposed (i.e. it is a row
vector). This is essential for the dimensions to work out, which we can check
now.

9
Since the whole point is to update the weights by some factor every time we
backpropagate in the direction of fastest descent to minimize the error, we want
to subtract the partial matrix (since it is in the direction of fastest ascent):
∂E
Wi = W i − α
∂Wi
∂E
where alpha is the learning rate. This requires ∂Wi to be the same dimensions
as Wi . Using W2 as an example, we know that

x2 = σ(W2 x1 )

where x2 is a 2×1 vector, x1 is a 4×1 vector, so W2 is a 2×4 matrix. Thus, both

∂E T
∂Wi and δ2 x1 are also 2 × 4 matrices. Since δ2 = (y − σ(W2 x1 )) σ 0 (W2 x1 ),
and we know y is a 2 × 1 matrix, δ2 has dimensions 2 × 1. If δ2 is 2 × 1, then it
must be multiplied by a 1 × 4 vector to create a 2 × 4 matrix. Since x1 is 4 × 1,
it must be transposed to become 1 × 4.
Let’s continue to the next weight matrix.
∂E ∂(σ(W2 x1 ))
= (x2 − y)
∂W1 ∂W1
∂E ∂W2 x1
= [(x2 − y) σ 0 (W2 x1 )]
∂W1 ∂W1
∂E ∂W2 x1 ∂x1
= δ2 = W2T δ2
∂W1 ∂W1 ∂W1
Substituting in for x1 , we get:
∂E ∂(σ(W1 x0 ))
= W2T δ2
∂W1 ∂W1
∂E ∂W1 x0
= [W2T δ2 σ 0 (W1 x0 )]
∂W1 ∂W1
Again, we simplify this:

δ1 = W2T δ2 σ 0 (W1 x0 )

and we finish with

∂E ∂W1 x0
= δ1
∂W1 ∂W1
∂E
= δ1 xT0
∂W1
We can generalize this for any layer. The only difference is the delta for the
last layer:
δL = (xL − y) σ 0 (WL xL−1 )
The delta for every other layer is:
T
δi = Wi+1 δi+1 σ 0 (Wi xi−1 )

10
And the gradient for every weight matrix are calculated and the weight
matrices are updated as follows:
∂E
= δi xTi−1
∂Wi

∂E
Wi = W i − α
∂Wi
For biases, the rule is simpler:

bi = bi − αδi
That is the essence of backpropagation. Note that these formulas work for
any activation function. The reason sigmoid is used to teach is because its
derivative is fairly straightforward:

σ 0 (x) = σ(x)(1 − σ(x))

5 Problems
1. Given the following network:

x0 x1 x2

W1 W2

the weight matrices, bias vectors and input are as follows:

 
2 3 4
2 1 2 3 1 1 1
W1 =   W2 =
3 5 1 1 4 2 2
2 3 4
 
  4
2 1

2
x0 = 1 b1 =  
1 b2 =
3
3
2

11
Instead of using the Sigmoid activation function, use a linear function
y = x, which always has a derivative of 1. Compute the output of one
forward pass, then compute a backward pass using the following target
and learning rate:
4
t= α = 0.1
5

2. Write out the forward propagation algorithm in Python. Use the Numpy
library for matrices.
3. Write out the backpropagation algorithm in Python. Use the Numpy
library for matrices.
4. Write an entire Neural Network in Python, using the Numpy library.

7-Working Example-01-08-2024
No ratings yet
7-Working Example-01-08-2024
29 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
A Gentle Introduction To Backpropagation
100% (1)
A Gentle Introduction To Backpropagation
15 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
AI Notes Chapter-1
No ratings yet
AI Notes Chapter-1
10 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
How To Build Your Own Neural Network From Scratch in
No ratings yet
How To Build Your Own Neural Network From Scratch in
6 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Artificial Intelligence (Cbse Code 417) Sample Questions Paper TERM 1 2021-22
No ratings yet
Artificial Intelligence (Cbse Code 417) Sample Questions Paper TERM 1 2021-22
3 pages
Neural Networks: Learning: Introduction To Machine Learning
No ratings yet
Neural Networks: Learning: Introduction To Machine Learning
8 pages
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
No ratings yet
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
7 pages
شبكات عصبية ٢
No ratings yet
شبكات عصبية ٢
6 pages
Neural Network
100% (1)
Neural Network
54 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Agent-Based Hybrid Intelligent System
No ratings yet
Agent-Based Hybrid Intelligent System
200 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Skymind The Math Behind Neural Networks
100% (1)
Skymind The Math Behind Neural Networks
17 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
NVDA Investor Presentation
No ratings yet
NVDA Investor Presentation
39 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
The Impact of AI On Tax Compliance and Reporting
No ratings yet
The Impact of AI On Tax Compliance and Reporting
20 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Classification BP Regression KNN Other Classifiers - Final
No ratings yet
Classification BP Regression KNN Other Classifiers - Final
116 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
SWOT Sapphire
67% (3)
SWOT Sapphire
4 pages
Big Data Analytics For Oncology PDF
No ratings yet
Big Data Analytics For Oncology PDF
49 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Mind - How To Build A Neural Network (Part One)
No ratings yet
Mind - How To Build A Neural Network (Part One)
9 pages
DS303 NN
No ratings yet
DS303 NN
20 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Neural
No ratings yet
Neural
53 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
100% (3)
(Ebooks PDF) Download (Ebook PDF) MIS 10th Edition by Hossein Bidgoli Full Chapters
49 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
Intro To DL
No ratings yet
Intro To DL
28 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Main
No ratings yet
Main
25 pages
Artificial Intelligence in Healthcare - Evaluating Health Care Professionals' Perspectives & Attitudes Towards Artificial Intelligence
No ratings yet
Artificial Intelligence in Healthcare - Evaluating Health Care Professionals' Perspectives & Attitudes Towards Artificial Intelligence
36 pages
AI Practical Logbook-Sample
No ratings yet
AI Practical Logbook-Sample
35 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Neural Networks Handout
No ratings yet
Neural Networks Handout
7 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Unit 2
No ratings yet
Unit 2
38 pages
Space Applications of Artificial Intelligence
No ratings yet
Space Applications of Artificial Intelligence
4 pages
OIDA Researchpaper Docx1
No ratings yet
OIDA Researchpaper Docx1
11 pages
Be - Computer Engineering - Semester 6 - 2023 - May - Artificial Intelligencerev 2019 C Scheme
No ratings yet
Be - Computer Engineering - Semester 6 - 2023 - May - Artificial Intelligencerev 2019 C Scheme
3 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Applied Artificial Intelligence in Modern Warfare and National Se
No ratings yet
Applied Artificial Intelligence in Modern Warfare and National Se
41 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Unit 4
No ratings yet
Unit 4
16 pages
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
No ratings yet
Deep Unordered Composition Rivals Syntactic Methods For Text Classification
11 pages
Presentation On Tourism Headings
No ratings yet
Presentation On Tourism Headings
45 pages
9 Ai CLASS IX QP
100% (1)
9 Ai CLASS IX QP
2 pages
Computer Studies Notes
No ratings yet
Computer Studies Notes
50 pages
AI and Machine Learning For Fashion Industry
No ratings yet
AI and Machine Learning For Fashion Industry
10 pages
Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
14 pages
NIPS2019 TGAN Supplementary PDF
No ratings yet
NIPS2019 TGAN Supplementary PDF
7 pages
04 Evolution Computing
No ratings yet
04 Evolution Computing
59 pages
Tik Tok
No ratings yet
Tik Tok
2 pages
Human Computer Interaction and Robotics
No ratings yet
Human Computer Interaction and Robotics
43 pages
Gnan Srishti Digest 2022-23
No ratings yet
Gnan Srishti Digest 2022-23
22 pages
2024 MCM Problem A Final
No ratings yet
2024 MCM Problem A Final
4 pages
CS-Emerging Trends in Cybersecurity A Glimpse Into Future Technologies-1
No ratings yet
CS-Emerging Trends in Cybersecurity A Glimpse Into Future Technologies-1
10 pages
Intersymbolic AI Interlinking Symbolic AI and Subs
No ratings yet
Intersymbolic AI Interlinking Symbolic AI and Subs
19 pages
Artificial Intelligence Question Paper
No ratings yet
Artificial Intelligence Question Paper
3 pages
Privacy and Data Security Concerns in AI1
No ratings yet
Privacy and Data Security Concerns in AI1
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NN 2

Uploaded by

NN 2

Uploaded by

Neural Networks: Forward and Backpropagation

n3 = σ(4 ∗ 0.2 + 5 ∗ 0.3 + 1) = σ(3, 3) = 0.964

n4 = σ(5 ∗ 0.2 + 6 ∗ 0.3 + 1) = σ(3.8) = 0.978

3.2 Vectorized Forward Propagation

xi+1 = σ(wi xi + bi+1 )

4.3 Gradient Descent

A multi-dimensional function. Look at that minimum!

Gradient descent is simple: Starting at some point, we move in the direction

From elementary calculus, we know that:

And cos(θ) is maximized at θ = 0. Thus, when two vectors are in the

4.4 Non-Vectorized Backpropagation

Then, we can rewrite the partial as

where x2 is a 2×1 vector, x1 is a 4×1 vector, so W2 is a 2×4 matrix. Thus, both

and we finish with

σ 0 (x) = σ(x)(1 − σ(x))

the weight matrices, bias vectors and input are as follows:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.