0% found this document useful (0 votes)

21 views67 pages

L5 Neural Network

The document provides an introduction to machine learning and artificial neural networks. It discusses supervised learning, unsupervised learning, and reinforcement learning. It then focuses on artificial neural networks, describing their biological inspiration, basic neuron structure, different activation functions, network architectures including feedforward and recurrent networks, and training approaches focused on parameter learning.

Uploaded by

chau.pm020902

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views67 pages

L5 Neural Network

Uploaded by

chau.pm020902

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Machine Learning

(Học máy – IT3190E)

Khoat Than
School of Information and Communication Technology
Hanoi University of Science and Technology

2023
2

Contents
¡ Introduction to Machine Learning
¡ Supervised learning
¨ Artificial neural network

¡ Unsupervised learning

¡ Reinforcement learning

¡ Practical advice
3
Artificial neural network: introduction (1)
¡ Artificial neural network (ANN) (mạng nơron nhân tạo)
¡ Simulates the biological neural systems (human brain)
¡ ANN is a structure/network made of interconnection of artificial
neurons
¡ Neuron
¡ Has input/output
¡ Executes a local calculation (local function)
¡ Output of a neuron is charactorized by
¡ In/out characteristics
¡ Connections between it and other neurons
¡ (Possible) other inputs
4
Artificial neural network: introduction (2)
¡ ANN can be thought of as a highly decentralized and parallel
information processing structure
¡ ANN can learn, recall and generalize from the training data
¡ The ability of an ANN depends on
¡ Network architecture
¡ Input/output characteristics
¡ Learning algorithm
¡ Training data
5
Structure of a neuron
¡ Input signals of a neuron x0=1
{𝑥𝑖 , 𝑖 = 1 … 𝑚}
x1 w0
¡ Each input signal 𝑥𝑖 is w1
associated with x2 Output
w2
a weight 𝑤𝑖 … S (Out)
wm
¡ Bias 𝑤0 (with 𝑥0 = 1) xm
¡ Net input is a combination
of the input signals
𝑁𝑒𝑡(𝒘, 𝒙)
Input (x) Net Activation
¡ Activation/transfer function input function
𝑓 3 computes the output of
a neuron (Net) (f)

¡ Output
𝑂𝑢𝑡 = 𝑓 (𝑁𝑒𝑡 (𝒘, 𝒙))
6
Net Input
¡ Net input is usually calculated by a function of linear form
m m
Net = w0 + w1 x1 + w2 x2 + ... + wm xm = w0 .1 + å wi xi = å wi xi
i =1 i =0

¡ Role of bias:
¡ Net=w1x1 may not separate well the classes
¡ Net=w1x1+w0 is able to do better

Net Net
Net = w1x1

Net = w1x1 + w0
x1 x1
7
Activation function: hard-limited
"$ 1, if Net ≥ θ
¡ Also known as a threshold Out(Net) = HL(Net, θ ) = #
function $% 0, otherwise

¡ The output takes one

of the two values Out(Net) = HL2(Net, θ ) = sign(Net, θ )

¡ q is the threshold value

¡ Properties: discontinuous, non-smoothed (không liên tục, không trơn)

Bipolar
Out
Binary
Out
hard-limiter
hard-limiter
1 1

q 0 Net
q 0 Net
-1
8
Activation function: threshold logic
ì
ï 0, if Net < -q
ïï 1
Out ( Net ) = tl ( Net, a , q ) = ía ( Net + q ), if - q £ Net £ - q
ï a
ï 1, if 1
Net > - q (α >0)
ïî a
= max(0, min(1, a ( Net + q )))
Out
¡ Also known as a saturating
linear function
¡ Combination of 2 activation 1
functions: linear and tight limits
¡ 𝛼 determines the slope of the linear -q 0 (1/α)-q Net
range
¡ Properties: continuous, 1/α
non-smoothed (liên tục, không trơn)
9
Activation function: Sigmoid
1
Out ( Net) = sf ( Net, a ,q ) =
1 + e -a ( Net +q )

Out
¡ Popular
¡ The parameter 𝛼 determines 1
the slope
0.5
¡ Output in the range of 0 and 1
¡ Advantages -q 0 Net
¡ Continuous, smoothed
¡ Gradient of a sigmoid function is represented by a function of itself
10
Activation function: Hyperbolic tangent
1 - e -a ( Net +q ) 2
Out ( Net) = tanh( Net, a ,q ) = -a ( Net +q )
= -a ( Net +q )
-1
1+ e 1+ e

Out

¡ Popular
1
¡ The parameter 𝛼 determines
the slope
-q 0 Net
¡ Output in the range of -1 and 1
-1
¡ Advantages
¡ Continuous, continuous derivative
¡ Gradient of a tanh function is represented by a function of itself
11
Act. function: Rectified linear unit (ReLU)
𝑂𝑢𝑡 𝑛𝑒𝑡 = max(0, 𝑛𝑒𝑡)

¡ Most popular
¡ Output is non-negative
¡ Advantages
¡ Continuous
¡ No derivative at point 0
¡ Easy to calculate
12
ANN: Architecture (1)
¡ ANN’s architecture is determined by bias

¡ Number of input and output signals input

¡ Number of layers
hidden
¡ Number of neurons in each layer layer
¡ Number of connection for each output
neuron layer
¡ How neurons (with in a layer, output
or between layers) are connected
¡ An ANN must have E.g: An ANN with single hidden layer
• Input: 3 signals
¡ An input layer
• Output: 2 signals
¡ An output layer
• Total, have 6 neurons
¡ No, single, or multiple hidden layers - 4 neurons at hidden layer

- 2 neurons at output layer

13
ANN: Architecture (2)
¡ A layer (tầng) contains a set of neurons
¡ Hidden layer (tầng ẩn) is a layer between input layer and output layer
¡ Hidden nodes do not interact directly with external environment of the
neural network
¡ An ANN is called a fully connected if outputs of a layer are connected
to all neurons of the next layer
14
ANN: Architecture (3)
¡ An ANN is called a feed-forward network (mạng lan truyền tiến)
if there is not any output of a node being input of another node of the
same layer or a previous layer
¡ When the output of a node is the input of the node the same layer or a
previous layer, it is called a feedback network (mạng phản hồi)
¡ If feedback connects to the input of nodes of the same layer, then it
is called a lateral feedback.
¡ Feedback networks with closed loops are called recurrent networks
(mạng hồi quy)
15
ANN: Architecture (4)
A neuron with
Feed-forward feedback to itself
network

Recurrent network
with single layer

Feed-forward
network with
multiple layers
Recurrent network
with multiple layers
16
ANN: Training
¡ 2 types of learning in ANNs
¡ Parameter learning: The goal is to adapt the weights of the
connections in the ANN, given a fixed network structure
¡ Structure learning: The goal is to learn the network structure,
including the number of neurons and the types of connections
between them, and the weights

¡ Those two types can be done simultaneously or separately

¡ In this lecture, we will only consider parameter learning
17
ANN: Idea for training
¡ Training a neural network (when fixing the architecture) is learning the
weights w of the network from training data D
¡ Learning can be done by minimizing an empirical loss function
!
L 𝒘 = ∑ 𝑙𝑜𝑠𝑠(𝑑& , out 𝒙 )
|𝑫| 𝒙∈𝑫

§ Where out(x) is the output of the network, with the input x labeled
accordingly as 𝑑# ; loss is a function for measuring prediction error

¡ Many gradient-based methods:

¡ Backpropagation
x0
¡ Stochastic gradient decent (SGD)
x1 w0
w1
¡ Adam x2
S
w2 Out
…
¡ AdaGrad wm
xm
18
Perceptron
¡ A perceptron is the simplest x0=1
type of ANNs x1 w0
(only one neuron). w1
x2 Out
w2
… S
¡ Use the hard-limited activation wm
function xm

æ m ö
Out = sign(Net( w, x) ) = signç å w j x j ÷÷
ç separation plane
è j =0 ø x1
w0+w1x1+w2x2=0

¡ For input x, the output value of perceptron

¡ 1 if 𝑁𝑒𝑡(𝒘, 𝒙) > 0 Output = 1

¡ -1 otherwise

Output = -1
x2
19
Perceptron: Algorithm
¡ Training data D = {(x, d)}
¡ x is input vector
¡ d is output (1 or -1)
¡ The goal of perceptron learning (training) process determines a weight
vector that allows the perceptron to produce the correct output value (-1
or 1) for each data point
¡ For data point x correctly classified by perceptron, the weight vector w
unchanged
¡ If d = 1 but the perceptron produces -1 (Out = -1), then w needs to be
changed so that the value of Net (w, x) increases
¡ If d = -1 but the perceptron produces 1 (Out = 1), then w needs to be
changed so that the value of Net (w, x) decreases
20
Perceptron: Batch training
Perceptron_batch(D, η)
Initialize w (wi ← an initial (small) random value)
do
∆w ← 0
for each instance (x,d) Î D
Compute the real output value Out
if (Out ¹ d)
∆w ← ∆w + η(d-Out)x
end for
w ← w + ∆w
until all the training instances in D are correctly classified
return w
21
Perceptron: Limitation
¡ The training algorithm for perceptron
A perceptron cannot
is proved to converge if: classify correctly for
¡ Data points are linearly separable this case!

¡ Use a learning rate η small enough

¡ The training algorithm for perceptron

may not converge if data points
are not linearly separable
22
Loss function
¡ Consider an ANN that has n output neurons
¡ For data point (x, d), the training error value caused by the (current)
weight vector w:
n 2
1
Ex (w ) = å (d i - Out i )
2 i =1
¡ Training error for the training set D is

1
ED ( w ) =
D
å E (w)
xÎD
x
23
Minimize errors with gradients
¡ Gradient of E (denoted by ∇E) is a vector

æ ¶E ¶E ¶E ö
ç
ÑE (w) = ç , ,..., ÷÷
è ¶w1 ¶w2 ¶wN ø
¡ where N is the total number of weights (connections) in the ANN
¡ The gradient ∇E determines the direction that causes the steepest
increase for the error value E
¡ Therefore, the direction that causes the steepest decrease is opposite
to the gradient of E
%&
D𝒘 = −h. Ñ𝐸 𝒘 ; D𝑤$ = −𝜂 %' for 𝑖 = 1 … 𝑁
!

¡ Requirement: all the activation functions must be smoothed

24
Gradient descent: Illustration

2-dimensional space
One-dimensional space
E(w1,w2)
E(w)
25
Incremental training
Gradient_descent_incremental (D, η)
Initialize w (wi ← an initial (small) random value)
do
for each training instance (x,d)ÎD
Compute the network output
If we take a small subset
for each weight component wi (mini-batch) randomly
from D to update the
wi ← wi – η(∂Ex/∂wi) weights, we will have
end for mini-batch training.

end for
until (stopping criterion satisfied)
return w
Stopping criterion: epochs, threshold error, ...
26
Backpropagation algorithm
¡ A perceptron can only represent a linear function
¡ A multi-layer NN learned by the Backpropagation (BP) algorithm can
represent a highly non-linear function
¡ The BP algorithm is used to learn the weights of an ANN
¡ Fixed network structure (một cấu trúc mạng đã chọn trước)
¡ For each neuron, the activation function must be differentiable

¡ The BP algorithm applies a gradient descent strategy to the rules for

updating weights
¡ To minimize errors between actual output values and desired output
values, for training data
27
Backpropagation algorithm (1)
¡ Back propagation algorithm seeks a vector of weights that minimizes
the net errors on the training data
¡ The BP algorithm consists of 2 phases:
¡ Forward pass: The input signals (input vector) are forwarded from
the input layer to the output layer (passing through hidden layers).
¡ Error backward:
¡ Based on the desired output value of the input vector, calculate
the error value
¡ From the output layer, the error value is backward-propagated
across the network, from a layer to previous layer, to the input
layer.
¡ Error back-propagation is executed by calculating
(regressively) the local gradient values of each neuron
28
Backpropagation algorithm (2)

Signal forward phase:

• Forward signals via the
network

Error backward phase:

• Calculate the error at the output
• Error back-propagation
29
Network structure
¡ Consider the 3-layer neural Input xj x1 ... xj ... xm
network (in the figure) to (j=1..m)
illustrate the BP algorithm wqj

¡ m input signals xj (j=1..m) Hidden

neuron zq ... ...
¡ l hidden neurons zq (q=1..l) Outq
(q=1..l)
¡ n output neurons yi (i=1..n) wiq

¡ wqj is the weight of the Output

connection from the input neuron yi ... ...
signal xj to the hidden (i=1..n)
neuron zq Outi

¡ wiq is the weight of the connection

from the hidden neuron zq to the output yi
¡ Outq is the (local) output value of the hidden neuron zq

¡ Outi is the output value of the network corresponding to the output neuron yi
30
BP algorithm: Forward (1)
¡ For each data point x
¡ Input vector x is forwarded from the input layer to the output layer
¡ The network will generate an actual output value Out (a vector with
value Outi, i = 1..n)
¡ For an input vector x, a neuron zq at the hidden layer receives the value
of net input: m
Netq = å wqj x j
j =1
then produces a (local) output value
æ m ö
Out q = f ( Netq ) = f ç å wqj x j ÷÷
ç
è j =1 ø
where f(.) is a activation function of neuron zq
31
BP algorithm: Forward (2)
¡ Net input value of the neuron yi at the output layer
l l æ m ö
Neti = å wiqOut q = å wiq f çç å wqj x j ÷÷
q =1 q =1 è j =1 ø
¡ Neuron yi produces output value (is an output value of network)

æ l ö æ l æ m öö
Outi = f ( Neti ) = f ç å wiqOutq ÷÷ =
ç f ç å wiq f çç å wqj x j ÷÷ ÷
ç q =1 ÷
è q =1 ø è è j =1 øø
¡ Vector of the output values Outi (i=1..n) is the actual output value of the
network, for the input vector x
32
BP algorithm: Backward (1)
¡ For each data point x
¡ Error signals due to the difference between the desired output
value d and the actual output value Out are calculated
¡ These error signals are back-propagated from the output layer to
the front layers, to update weights
¡ To consider the error signals and their back-propagated ones, an error
function needs to be defined
1 n 1 n
E (w) = å (d i - Out i ) = å [d i - f ( Net i )]
2 2

2 i =1 2 i =1
2
1 é n æl öù
= å êd i - f çç å wiqOut q ÷÷ú
2 i =1 êë è q =1 øúû
33
BP algorithm: Backward (2)
¡ According to the gradient descent method, the weights of the
connections from the hidden layer to the output layer are updated by
¶E
Dwiq = -h
¶wiq
¡ Using the derivative chain rule for ¶E/¶wiq, we have
é ¶E ù é ¶Outi ù é ¶Neti ù
Dwiq = -h ê úê úê [ ]
ú = h [di - Outi ][ f ' (Neti )] Outq = hd i Outq
ë ¶Outi û ë ¶Neti û êë ¶wiq úû
¡ di is error signals of neuron yi at output layer
¶E é ¶E ù é ¶Outi ù
di = - = -ê úê ú = [di - Outi ][ f ' (Neti )]
¶Neti ë ¶Outi û ë ¶Neti û

where Neti is the net input of the neuron yi at the output layer,
and f'(Neti)=¶f(Neti)/¶Neti
34
BP algorithm: Backward (3)
¡ To update the weights of the connections from the input layer to the
hidden layer, we also apply the gradient-descent method and the
derivative chain rule

¶E é ¶E ù é ¶Outq ù é ¶Netq ù
Dwqj = -h = -h ê úê úê ú
¶wqj êë ¶Outqú
û êë ¶Net q ú
û êë ¶wqj ú
û

¡ From the formula for calculating the error function E(w), we see that
each error component (di-yi) (i=1..n) is a function of Outq
2
1 é n æl öù
E (w) = å êdi - f çç å wiqOut q ÷÷ú
2 i =1 êë è q =1 øúû
35
BP algorithm: Backward (4)
¡ Apply the derivation chain rule, we have
[ ]
Dwqj = h å (d i - Out i ) f ' ( Net i ) wiq f ' (Net q ) x j
n

i =1

[ ]
= h å d i wiq f ' (Net q ) x j = hd q x j
n

i =1

¡ dq is error signals of neuron zq at hidden layer

¶E é ¶E ù é ¶Outq ù
ú = f ' (Netq )å d i wiq
n
dq = - = -ê úê
¶Netq êë ¶Outq úû êë ¶Netq úû i =1

where Netg is the net input of the neuron zq at the hidden layer,
and f'(Netq)=¶f(Netq)/¶Netq
36
BP algorithm: Backward (5)
¡ According to the formulas for calculating the error signals di and dq, the
error signal of a neuron in the hidden layer is different from the error
signal of a neuron in the output layer
¡ Because of this difference, the weight update procedure in BP
algorithm is also known as general delta learning rule
¡ Error signals dq of neuron zq at hidden layer determined by:
¡ Error signals di of neuron yi at output layer (to which neuron zq are
connected)
¡ The weights wiq
37
BP algorithm: Backward (6)
¡ The process of calculating the error signals as above can be extended
(generalized) easily for neural networks with more than 1 hidden layer
¡ The general form of the weighting update rule in BP algorithm

Dwab = hdaxb
¡ b and a are 2 indices corresponding to the two ends of the
connection (b → a) (from a neuron (or input signal) b to neuron a)
¡ xb is the output value of the neuron at the hidden layer (or input
signal) b
¡ da is error signal of neuron a
38
BP algorithm
Back_propagation_incremental(D, η)
Neural network consists of Q layer, q = 1,2,...,Q
qNet and qOuti are net input and output value of neuron i at the layer q
i

Network has m input signals and n output neuron

qw is the weight of the connection from neuron j at the layer (q-1) to the neuron i at the
ij
layer q

Step 0 (Initialization)
Select the error threshold Ethreshold (the error value is acceptable)
Initialize the initial value of the weights with random small values
Assign E=0
Step 1 (Start a training cycle)
Apply the input vector of the data point k to the input layer (q=1)
qOut
i = 1Outi = xi(k), "i
Step 2 (Forward)
Forward the input signals over the network, until the network output values (at the output
æ q q -1 ö
( Net ) = f çç å
layer) are received QOuti q
Out i = f
q
i wij Out j ÷÷
è j ø
39
BP algorithm
Step 3 (Calculate the output error)
Calculate network output error and error signal Qd of each neuron at output layer
i
n
1
E=E+ å i
2 i =1
( d -
(k ) Q
Out i ) 2

Q
δi = (d i(k) - QOut i )f '( QNet i )
Step 4 (Error backward)
Backpropagation the error to update the weights and calculate the error signals q-1d for the
i
front layers
Dqwij = h.(qdi).(q-1Outj); qw
ij = qwij + Dqwij

δi = f '( q -1Neti )å q w ji q δ j ; for all q = Q, Q - 1,...,2

q -1

j
Step 5 (Check stopping criterion satisfied)
Check if the entire training data has been used yet
If the entire training data has used, go to Step 6, otherwise go to Step 1
Step 6 (Check net error)
If net error E is less than the acceptable threshold (<Ethreshold), then training is completed
and returns the learned weights;
otherwise, assign E=0, and start new training cycle (go back to Step 1)
40
BP algorithm: Forward (1)

f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3)
41
BP algorithm: Forward (2)

f(Net1)
w1x1 x1
x1 w1x2 x2 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3)
Out1 = f ( w1x1 x1 + w1x2 x2 )
42
BP algorithm: Forward (3)

f(Net1)

x1 f(Net4)
w2 x1 x1
Out6
f(Net6)
f(Net2)

w2 x2 x2
x2 f(Net5)

f(Net3)
Out 2 = f ( w2 x1 x1 + w2 x2 x2 )
43
BP algorithm: Forward (4)

f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 w3 x1 x1 f(Net5)

w3 x2 x2 f(Net3)
Out 3 = f ( w3 x1 x1 + w3 x2 x2 )
44
BP algorithm: Forward (5)

f(Net1)
w41Out1
x1 f(Net4)
w42Out2
Out6
f(Net2)
w43Out 3 f(Net6)

x2 f(Net5)

f(Net3)
Out 4 = f ( w41Out1 + w42Out 2 + w43Out3 )
45
BP algorithm: Forward (6)

f(Net1)

x1 w51Out1 f(Net4)

Out6
f(Net6)
f(Net2)
w52Out 2
x2 f(Net5)

w53Out 3
f(Net3)
Out5 = f (w51Out1 + w52Out 2 + w53Out3 )
46
BP algorithm: Forward (7)

f(Net1)

x1 f(Net4)
w 64Out 4
f(Net6)
f(Net2)

w65Out 5
x2 f(Net5)

f(Net3)
Out 6 = f (w64Out 4 + w65Out5 )
47
BP algorithm: Calculate error

f(Net1)

f(Net4)
d6
Out6
f(Net6)
f(Net2)

f(Net5)
d is the desired
output value
f(Net3) é ¶E ù é ¶Out 6 ù
¶E
d6 = - = -ê úê ú = [d - Out 6 ] [ f ' (Net6 )]
¶Net6 ë ¶Out 6 û ë ¶Net6 û
48
BP algorithm: Backward(1)

f(Net1)
d4
x1 f(Net4)
w64 d6
Out6
f(Net6)
f(Net2)

x2 f(Net5)

f(Net3)
δ4 = f ' (Net4 )(w64δ6 )
49
BP algorithm: Backward(2)

f(Net1)

x1 f(Net4)
d6
Out6
f(Net6)
f(Net2)
d5
w65
x2 f(Net5)

f(Net3)
δ5 = f '(Net5 )(w65δ6 )
50
BP algorithm: Backward(3)

d1
f(Net1)
w41 d4
x1 w51 f(Net4)

Out6
f(Net6)
f(Net2)
d5
x2 f(Net5)

f(Net3) δ1 = f '(Net1 )(w41δ4 + w51δ5 )

51
BP algorithm: Backward(4)

f(Net1)
d4
x1 w42 f(Net4)
d2
Out6
f(Net6)
f(Net2) w52 d5
x2 f(Net5)

f(Net3)
δ2 = f '(Net2 )(w42δ4 + w52δ5 )
52
BP algorithm: Backward(5)

f(Net1)
d4
x1 f(Net4)

Out6
f(Net6)
f(Net2) w43
d5
x2 w53 f(Net5)
d3
f(Net3)
δ3 = f '(Net3 )(w43δ4 + w53δ5 )
53
BP algorithm: Update weight(1)

d1
w1x1 f(Net1)

x1 w1x2 f(Net4)

Out6
f(Net6)
f(Net2)

x2 f(Net5)

w1x1 = w1x1 + hd1 x1

f(Net3)
w1x2 = w1x2 + hd1 x2
54
BP algorithm: Update weight(2)

f(Net1)

x1 f(Net4)

w2 x1 d2 Out6
f(Net6)
f(Net2)

w2 x2
x2 f(Net5)

w2 x1 = w2 x1 + hd 2 x1
f(Net3)
w2 x2 = w2 x2 + hd 2 x2
55
BP algorithm: Update weight(3)

f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)

x2 w3x1
d3
f(Net5)

w3 x2
f(Net3)
w3 x1 = w3 x1 + hd 3 x1
w3 x2 = w3 x2 + hd 3 x2
56
BP algorithm: Update weight(4)

f(Net1)
w41 d4
x1 w42 f(Net4)

Out6
f(Net2)
w43 f(Net6)

x2 f(Net5)

w41 = w41 + hd 4Out1

f(Net3) w42 = w42 + hd 4Out 2
w43 = w43 + hd 4Out 3
57
BP algorithm: Update weight(5)

f(Net1)

x1 f(Net4)

Out6
f(Net6)
f(Net2)
w 51 d5
w52
x2 f(Net5)

w53 w51 = w51 + hd 5Out1

f(Net3) w52 = w52 + hd 5Out 2
w53 = w53 + hd 5Out3
58
BP algorithm: Update weight(6)

f(Net1)

x1 f(Net4)
w64 d6
Out6
f(Net6)
f(Net2)

w65
x2 f(Net5)

f(Net3)
w64 = w64 + ηδ6Out 4
w65 = w65 + ηδ6Out5
59
BP algorithm: Initialize weights
¡ Normally, weights are initialized with random small values
¡ If the weights have large initial values
¡ Sigmoid functions will reach saturation soon
¡ The system will deadlock at a saddle / stationary points
60
BP algorithm: Learning rate
¡ Important effect on the efficiency and convergence of BP algorithm
¡ A large value of h can accelerate the convergence of the learning
process, but can cause the system to ignore the global optimal
point or focus on bad points (saddle points).
¡ A small h value can make the learning process take a long time
¡ Often select it empirically
¡ Good values of learning rate at the beginning (learning process) may
not be good at a later time
¡ Using an adaptive (dynamic) learning rate?
61
BP algorithm: Momentum
¡ The gradient descent method can aDw(t’)
-hÑE(t’+1)

-hÑE(t’+1) + aDw(t’) Dw(t’)

be very slow if h is small and can B
A’ Dw(t)
fluctuate greatly if h is too large A

¡ To reduce the level of fluctuations,

it is necessary to add a momentum
component -hÑE(t+1)
B’
aDw(t)
Dw(t) = -hÑE(t) + aDw(t-1) -hÑE(t+1) + aDw(t)

¡ where a (Î[0,1]) is a momentum

Gradient descent for a simple
parameter (usually assign = 0.9)
square error function.
¡ We should choose reasonable The left trajectory does not use
values for learning rate and momentum.
satisfying momentum The right trajectory uses
momentum.
(h + a) ≳ 1
where a > h to avoid fluctuations
62
BP algorithm: Number of neurons
¡ The size (number of neurons) of the hidden layer is an important
question for the application of multi-layer neural network to solve
practical problems
¡ In fact, it is difficult to identify the exact number of neurons needed to
achieve the desired system accuracy
¡ The size of the hidden layer is usually determined through experiments
(experiment/trial and test)
63
ANN: Learning limit
¡ Boolean functions
¡ Any binary function can be learnt (approximately well) by an ANN
using one hidden layer
¡ Continuous functions
¡ Any bounded continuous function can be learnt (approximately) by
an ANN using one hidden layer [Cybenko, 1989; Hornik et al., 1991]
64
ANN: advantages, disadvantages
¡ Advantages
¡ Supports high-level parallel computation
¡ Obtain high accuracy in many problems (image, video, audio, text)
¡ Be flexible in network architecture
¡ Disadvantages
¡ There is no general rule for determining the network architecture
and optimal parameters for a given problem
¡ It is unclear about ANN's inner workings (thus, the ANN system is
viewed as a "black box").
¡ It is difficult (impossible) to give explanations to the user
¡ Fundamental theories are few, to help explain the real successes
65
ANN: When?
¡ The form of the learned function is not predetermined
¡ It is not necessary (or unimportant) to provide an explanation to the
user about the results
¡ Accept long time for the training process
¡ Can collect a large number of labels for data
66
Open library
67
References
¡ Cybenko, G. (1989) "Approximations by superpositions of sigmoidal
functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-
314
¡ Kurt Hornik (1991) "Approximation Capabilities of Multilayer
Feedforward Networks", Neural Networks, 4(2), 251–257

Artificial Neural Networks (ANN) : 1-Introduction
No ratings yet
Artificial Neural Networks (ANN) : 1-Introduction
5 pages
Chap11 Neural Nets
No ratings yet
Chap11 Neural Nets
38 pages
2024-05-07 - Module Réseaux de Neurones Pour La Performance Industrielle
No ratings yet
2024-05-07 - Module Réseaux de Neurones Pour La Performance Industrielle
61 pages
Lecture7B Classification
No ratings yet
Lecture7B Classification
78 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Neural Network and Fuzzy Logic
50% (2)
Neural Network and Fuzzy Logic
54 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
II. Artificial Neural Networks
No ratings yet
II. Artificial Neural Networks
73 pages
Neural Network
No ratings yet
Neural Network
18 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
Understanding and Coding Neural Networks From Scratch in Python and R
100% (1)
Understanding and Coding Neural Networks From Scratch in Python and R
15 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
09-Neural Networks
No ratings yet
09-Neural Networks
18 pages
Neural Network
No ratings yet
Neural Network
97 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Project Data Mining
No ratings yet
Project Data Mining
55 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
24 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
ANNs
No ratings yet
ANNs
57 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Book: Mathematics For Economics and Business
No ratings yet
Book: Mathematics For Economics and Business
2 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
31 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
DL Presentation
No ratings yet
DL Presentation
82 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
M3 Transcript
No ratings yet
M3 Transcript
10 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
CH 12 - Artificial Neural Networks
No ratings yet
CH 12 - Artificial Neural Networks
39 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Neural Networks
No ratings yet
Neural Networks
54 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
Ai Unit 4 Part 2
No ratings yet
Ai Unit 4 Part 2
45 pages
Chapter 3 - Forecasting - EXCEL TEMPLATES
No ratings yet
Chapter 3 - Forecasting - EXCEL TEMPLATES
14 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Quantitative Psychology The 86th Annual Meeting of The Psychometric Society, Virtual, 2021 Full Access Download
100% (15)
Quantitative Psychology The 86th Annual Meeting of The Psychometric Society, Virtual, 2021 Full Access Download
16 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Simple Exponential Smoothing
No ratings yet
Simple Exponential Smoothing
32 pages
Feedforward
No ratings yet
Feedforward
34 pages
LCM & CLCM Printout
No ratings yet
LCM & CLCM Printout
4 pages
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
No ratings yet
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
8 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
AlphaGo Tutorial Slides
No ratings yet
AlphaGo Tutorial Slides
16 pages
Report Analysis: Over-View of The Dataset
No ratings yet
Report Analysis: Over-View of The Dataset
6 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Unit 1
No ratings yet
Unit 1
19 pages
Artificial Neural Networks (Anns) : Intro
No ratings yet
Artificial Neural Networks (Anns) : Intro
15 pages
A I F: I - D R L P O: Dvancing Nvestment Rontiers Ndustry Grade EEP Einforcement Earning For Ortfolio Ptimization
No ratings yet
A I F: I - D R L P O: Dvancing Nvestment Rontiers Ndustry Grade EEP Einforcement Earning For Ortfolio Ptimization
25 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
Diffie Hellman PDF
No ratings yet
Diffie Hellman PDF
6 pages
Functions: Practical No.68
No ratings yet
Functions: Practical No.68
7 pages
Computer Vision Nanodegree Syllabus: Before You Start
No ratings yet
Computer Vision Nanodegree Syllabus: Before You Start
5 pages
02 Estimation
No ratings yet
02 Estimation
20 pages
Dsa Lab 12 064
No ratings yet
Dsa Lab 12 064
7 pages
Traffic Sign Recognition Project
No ratings yet
Traffic Sign Recognition Project
9 pages
Problem Proposal: Flipping Bits in A String
No ratings yet
Problem Proposal: Flipping Bits in A String
3 pages
Machine Learning From Rohit
No ratings yet
Machine Learning From Rohit
14 pages
Me-Pse Curriculum and Syllabus
No ratings yet
Me-Pse Curriculum and Syllabus
73 pages
Aifb Lab Manual Exp 6 - Aids
No ratings yet
Aifb Lab Manual Exp 6 - Aids
3 pages
Mat PROJEKT MAT 1 PDF
No ratings yet
Mat PROJEKT MAT 1 PDF
10 pages
Option Pricing Calculator: Michael Rechenthin, PHD
No ratings yet
Option Pricing Calculator: Michael Rechenthin, PHD
2 pages
Spiking Neural Nets For Image Classification
No ratings yet
Spiking Neural Nets For Image Classification
8 pages
Kalman Decomposition Examples
No ratings yet
Kalman Decomposition Examples
3 pages
MA3115 Problem Sheet-1
No ratings yet
MA3115 Problem Sheet-1
2 pages
Math Aa HL p3 QP
No ratings yet
Math Aa HL p3 QP
3 pages
English Life Tables No.17 2010 To 2012
No ratings yet
English Life Tables No.17 2010 To 2012
13 pages
I 24 Nov 2023 Lab Exam Questions Material
No ratings yet
I 24 Nov 2023 Lab Exam Questions Material
2 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Computational Stats Aiml Notes
No ratings yet
Computational Stats Aiml Notes
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L5 Neural Network

Uploaded by

L5 Neural Network

Uploaded by

Machine Learning

(Học máy – IT3190E)

¡ The output takes one

¡ q is the threshold value

¡ Number of input and output signals input

- 2 neurons at output layer

¡ Those two types can be done simultaneously or separately

¡ Many gradient-based methods:

¡ For input x, the output value of perceptron

¡ Use a learning rate η small enough

¡ The training algorithm for perceptron

¡ Requirement: all the activation functions must be smoothed

¡ The BP algorithm applies a gradient descent strategy to the rules for

Signal forward phase:

Error backward phase:

¡ m input signals xj (j=1..m) Hidden

¡ wqj is the weight of the Output

¡ wiq is the weight of the connection

¡ dq is error signals of neuron zq at hidden layer

Network has m input signals and n output neuron

δi = f '( q -1Neti )å q w ji q δ j ; for all q = Q, Q - 1,...,2

f(Net3) δ1 = f '(Net1 )(w41δ4 + w51δ5 )

w1x1 = w1x1 + hd1 x1

w41 = w41 + hd 4Out1

w53 w51 = w51 + hd 5Out1

-hÑE(t’+1) + aDw(t’) Dw(t’)

¡ To reduce the level of fluctuations,

¡ where a (Î[0,1]) is a momentum

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.