0% found this document useful (0 votes)
92 views27 pages

Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks

The document discusses multi-layer feedforward neural networks (MLFFN). MLFFNs are needed to overcome the limitations of single-layer perceptrons and solve nonlinear problems. They do this by dividing the problem space into smaller linearly separable regions and combining the outputs of multiple hidden neurons. Gradient descent and backpropagation are used to minimize an error function and update weights and biases in the network to reduce errors.

Uploaded by

mesfer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views27 pages

Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks

The document discusses multi-layer feedforward neural networks (MLFFN). MLFFNs are needed to overcome the limitations of single-layer perceptrons and solve nonlinear problems. They do this by dividing the problem space into smaller linearly separable regions and combining the outputs of multiple hidden neurons. Gradient descent and backpropagation are used to minimize an error function and update weights and biases in the network to reduce errors.

Uploaded by

mesfer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Neural Networks and

Fuzzy Systems

Multi-layer Feed forward Networks

Dr. Tamer Ahmed Farrag


Course No.: 803522-3
Course Outline
Part I : Neural Networks (11 weeks)
• Introduction to Machine Learning
• Fundamental Concepts of Artificial Neural Networks
(ANN)
• Single layer Perception Classifier
• Multi-layer Feed forward Networks
• Single layer FeedBack Networks
• Unsupervised learning
Part II : Fuzzy Systems (4 weeks)
• Fuzzy set theory
• Fuzzy Systems
2
Outline

• Why we need Multi-layer Feed forward Networks


(MLFF)?
• Error Function (or Cost Function or Loss function)
• Gradient Descent
• Backpropagation

3
Why we need Multi-layer Feed forward Networks
(MLFF)?

• Overcoming failure of single layer perceptron in


solving nonlinear problems.
• First Suggestion:
• Divide the problem space into smaller linearly separable
regions
• Use a perceptron for each linearly separable region
• Combine the output of multiple hidden neurons to
Region 1
produce a final decision neuron. gio
n2
+ Re
io n1
Reg

Region 2

4
Why we need Multi-layer Feed forward Networks
(MLFF)?
• Second suggestion
• In some cases we need a curve decision boundary or we try to solve
more complicated classification and regression problems.
• So, we need to:
• Add more layers
• Increase a number of neurons in each layer.
• Use non linear activation function in
the hidden layers.

• So , we need Multi-layer Feed forward Networks (MLFF).

5
Notation for Multi-Layer Networks
•• Dealing
  with multi-layer networks is easy if a sensible notation is
adopted.
• We simply need another label (n) to tell us which layer in the network
we are dealing with.
• Each unit j in layer n receives activations from the previous layer of
processing units and sends activations to the next layer of units.

layer (0) layer (1) layer (n-1) layer (n)


(𝟏)
1 𝒘
  𝒊𝒋
1 (𝒏)
𝒘
  𝒊𝒋
2
2

6
ANN Representation
(1 input layer + 1 hidden layer +1 output layer)
layer (0) layer (1) layer (2)
(𝟏)
𝒘
  𝟏𝟏
  =
(𝟐)

𝒘
 
(𝟏) 𝒘
 
(𝟏)
𝟏𝟐 ( 𝒛 |𝒂 ) 𝒘 
 (𝟏)
𝟏
(𝟏)
𝟏
𝒘
  𝟏𝟏

𝒘
 
(𝟏)
𝟏𝟑

𝒘
(𝟐)
𝟏𝟐

(𝟐)
( 𝒛𝟏 |𝒂 )
 (𝟐) (𝟐)
𝟏
 𝒚 𝟏

(𝟏)   𝟐𝟏
𝟐𝟏 (𝟏)
𝒘
 
𝑥2  =   𝑎
(0)
2 (𝟏)
𝟐𝟐
( 𝒛 𝟐 |𝒂 )
 (𝟏)
𝟐 𝒘
(𝟐)
  𝟐𝟐
𝒘  𝟐𝟑
𝒘
 
(𝟏)
𝒘
(𝟐)
  𝟑𝟏 ( 𝒛𝟐 |𝒂 )
 (𝟐) (𝟐)
𝟐
 𝒚 𝟐
𝟑𝟏
( 𝒛 𝟑 |𝒂 )
 (𝟏) (𝟏) (𝟐)
𝒘
(𝟏)
  𝟑𝟐 𝒘
  𝟑𝟐
𝟑
(0) (𝟏)
𝑥2  =   𝑎 2
𝒘
  𝟑𝟑 •   example:
for
+ +
 
σ ()
++
σ ()
7
Gradient Descent
and Backpropagation
Error Function
● how we can evaluate performance of a neuron ????
● We can use a Error function (or cost function or loss
function) to measure how far off we are from the
expected value.
● Choosing appropriate Error function help the learning
algorithm to reach to best values for weights and biases.
● We’ll use the following variables:
○ D to represent the true value (desired value)
○ y to represent neuron’s prediction

9
Error Functions
(Cost function or Lost Function)
•  There are many formulates for error functions.
• In this course, we will deal with two Error function
formulas.
1Sum Squared Error (SSE) :
for single perceptron

Cross entropy (CE):


2

10
Why the error in ANN occurs?
• Each weight and bias in the network contribute in
the occasion of the error.

• To solve this we need:


• A cost function or error function to compute the error.
(SSE or CE Error function)
• An optimization algorithm to minimize the error
function. (Gradient Decent)
• A learning algorithm to modify weights and biases to
new values to get the error down. (Backpropagation)
• Repeat this operation until find the best solution

11
Gradient Decent (in 1 dimension)
• Assume we have a error function E and we need to
use it to update one weight w
• The figure show the error function in terms of w
• Our target is to learn the value of w produces the
minimum value of E.

How?
E

W
minimum 12
Gradient Decent (in 1 dimension)
• In
  Gradient Decent algorithm, we use the following
equation to get a better value of w:
 (called Delta rule)
Where:
: is the learning rate
: is mathematically can be computed using derivative of
E with respect to w ()

E
  (3)

W
minimum 13
Local Minima problem

14
Choosing learning rate

15
Gradient Decent (multi dimension)
•• In
  ANN with many layers and many neurons in each layer the
Error function will be multi-variable function.
• So, the derivative in equation (3) should be partial derivative
(4)

• We write equation (4) as :

• Same process will be use to get the


new bias value:

16
derivative of activation functions

Sigmoid

17
Learning Rule in the output layer
•using
  SSE as error function and sigmoid
as Activation function
= * *
Where:

From the previous table:


= ) 18
Learning Rule in the output layer (cont.)

•So  (How?),

• Then:
=

19
Learning Rule in the Hidden layer
• Now we have to determine the appropriate
weight change for an input to hidden weight.
• This is more complicated because it depends on
the error at all of the nodes this weighted
connection can lead to.
• The mathematical proof is out our scope.

20
Gradient Decent (Notes)
•Note
  1:
• the neuron activation function (f ) should be is
defined and differentiable function.
Note 3:
• The calculating of for the hidden layer will be more
difficult (Why?)
Note 2:
• The previous calculation will be repeated for each
weight and for each bias in the ANN
• So, we need big computational power (what about
deeper networks? )

21
Gradient Decent (Notes)
•   is represent the change in the values of to get
better output
• The equation of is dependent on the choosing of
the Error(Cost) function and activation function.
• Gradient Decent algorithm help in calculated the
new values of weights and bias.
• Question: is one iteration (one trail) enough to bet
the best values for weights and biases
• Answer: No, we need a extended version ?
Backpropagation
22
How Backpropagation Work?
𝑭𝒐𝒓𝒘𝒂𝒓𝒅
  𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏 𝑩𝒂𝒄𝒌
  𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏

𝒍𝒂𝒚𝒆𝒓
  𝟎 𝒍𝒂𝒚𝒆𝒓
  𝟏 𝒍𝒂𝒚𝒆𝒓
  𝟐

𝒘
  𝟏𝟏  
(𝟏)
-
(𝟏)
𝒂  𝟏
(𝟏)
𝒘
  𝟏𝟐 𝒘
(𝟐)
  𝟏𝟏   -
(𝟏)
𝒘
  𝟐𝟏

𝒘 (𝟏)
  𝟐𝟐 𝒚 
(𝟏)
𝒘
  𝟑𝟏 (𝟐)
𝒘
  𝟐𝟏
(𝟏)
𝒘
  𝟑𝟐

23
Online Learning vs. Offline Learning

 
• Online: Pattern-by-Pattern  
• Offline: Batch learning
learning • Error calculated for all
• Error calculated for each patterns
pattern • Weights updated once at
• Weights updated after each the end of each epoch
individual pattern

24
Choosing Appropriate Activation and Cost
Functions
• We already know consideration of single layer networks what output
activation and cost functions should be used for particular problem types.
• We have also seen that non-linear hidden unit activations are needed,
such as sigmoids.
• So we can summarize the required network properties:
• Regression/ Function Approximation Problems
• SSE cost function, linear output activations, sigmoid hidden activations
• Classification Problems (2 classes, 1 output)
• CE cost function, sigmoid output and hidden activations
• Classification Problems (multiple-classes, 1 output per class)
• CE cost function, softmax outputs, sigmoid hidden activations
• In each case, application of the gradient descent learning algorithm (by
computing the partial derivatives) leads to appropriate back-propagation
weight update equations.

25
Overall picture : learning process on ANN

26
Neural network simulator
• Search through the internet to find a simulator and
report it

For example:
• https://www.mladdict.com/neural-network-simula
tor

• http://playground.tensorflow.org/

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy