Training of Neural Networks: Q.J. Zhang, Carleton University
Training of Neural Networks: Q.J. Zhang, Carleton University
(b)Data generator
• Measurement : for each given xk , measure values of yk ,
k=1,2,…, p
• Simulation: for each given xk , use a simulator to
calculate yk , k=1,2,…, p
Availability of Problem Model can be developed even if Model can be developed only for
Theory-Equations the theory-equations are not the problems that have theory that
CAD.
Assumptions No assumptions are involved and Often involves assumptions and the
the model could include all the model will be limited by the
Input Parameter Sweep Data generation could either be Relatively easier to sweep any
sampled/changed.
Basis of Comparison Neural Model Development Using Neural Model Development Using
Sources of Small and Equipment limitations and Accuracy limitations and non-
Desired Output for measurable responses only. For long as it can be computed by the
x1
x2
• Linear scale
x − xmin ~
Scale formula: x~ = x~min + ( xmax − x~min )
xmax − xmin
~
x −~
xmin
De-scaled formula: x = x + ~ ~ ( xmax − xmin )
x −x
min
max min
• Log scale
Scale formula: ~ = ln( x − x )
x min
x~
De-scale formula: x = xmin + e
Training data
Neural network Trained Neural network
Scaled
model
Data
Scaling
Scaling
x x
x
1
y j ( xk , w ) − d jk
q q
1 m
Validation error: EV ( w ) =
size(V) m kV j =1 ymax j − ymin j
1
y j ( xk , w ) − d jk
q q
1 m
Test error: ETe ( w ) =
size(Te ) m k Te j =1 ymax j − ymin j
ETr
Training error: The training error ETr(w) and its derivative w
are used to determine how to update w during training
Test error: The test error ETe(w) is used after training has finished
to provide a final assessment of the quality of the trained
neural network. Test error is not involved during training.
No
Yes
Select a neural network
structure, e.g., MLP
STOP
Training
Evaluate test error as
START an independent quality Perform feedforward
measure
measure
of for
the ANN
trained computation for all
model
neuralreliability
network samples in test set
0.8
0.4
0.0
Neural network
Possible reasons:
a) Not enough hidden neurons
b) Training stuck at local solution
c) Not enough training
Actions:
a) Add hidden neurons
b) More training
c) Perturb solution, then train
Q.J. Zhang, Carleton University
Neural Network Under-Learning
Output (y)
1.5
0.5
0.0
Neural network
Definition (strict):
Math ETr EV 0
0.8
.
0.4
.
.
.
0.0
Neural network
.
-0.4 Training data
. Validation data
-0.8
-5 0 5 10 15
input (x)
• Batch-mode (or offline) training: ANN weights are updated after each
epoch, i.e., weight update is based on training error from all the
samples in training data set
x
Q.J. Zhang, Carleton University
Training Problem Statement:
Given training data (xk ,d k ), k Tr
Validation data (x k , d k ), k V
NN model y(x, w)
Find values of w, such that validation error is minimized
min EV ( epoch)
epoch
1
where y j ( xk , w (epoch))− d jk
q q
Δ 1 m
EV (epoch)=
VP m kV j =1 ymax j − ymin j
PV = size(V )
w(epoch) = w(epoch − 1) + w(epoch − 1)
w |epoch=0 = user’s / software initial guess
w = h h
for w = h h
ETr(w)=(w1 -1)2 + (w2 -2)2 ETr(w)=4(w1 -1)2 + (w2 -2)2 ETr(w)=(1.73(w1 -1)-(w2 -2))2 +
0.25*((w1 -1)+1.73(w2 -2))2
w2 w2 w2
w1 w1 w1
Location of w
Gradient
direction
Negative
gradient
direction
Conjugate
gradient
direction
w1
Speed: fast
2
Main Memory Needed: N W (Large)
T
e
J is Jacobian, J = ( )T
w
0 Typical Levenberg Marquardt
= 0 Gauss Newton
Convergence
Rate
fast more
Levenberg Marquardt (for small-residue problems)
Quasi-Newton
Conjugate Gradient
BP
slow
less
Training Algorithm No. of Epochs Training Error (%) Avg. Test Error (%) CPU (in Sec)
Adaptive
Conjugate-gradient
Quasi-Newton
Levenberg-
Training Algorithm No of Epochs Training Error (%) Ave. Test Error (%) CPU (in Sec)
Adaptive
Conjugate Gradient
Levenberg-