0% found this document useful (0 votes)

84 views44 pages

Training of Neural Networks: Q.J. Zhang, Carleton University

The document discusses the training of neural networks. It covers defining the model inputs and outputs, generating training data through measurement or simulation, preprocessing and scaling the data, and dividing it into training, validation, and testing sets. Formulas are provided for calculating the training and validation errors during the neural network training process.

Uploaded by

Aditi Biswas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views44 pages

Training of Neural Networks: Q.J. Zhang, Carleton University

Uploaded by

Aditi Biswas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Training of Neural Networks

Q.J. Zhang, Carleton University

Notation:
x: input of the original modeling problem or the neural network

y: output of the original modeling problem or the neural network

w: internal weights/parameters of the neural network

m: number of outputs of the model

y = f(x , w) : neural network model

d: data for y (e.g., trainining data)

Q.J. Zhang, Carleton University

Define Model Input-Output

Define model input-output (x, y), for example,

x: physical/geometrical parameters of the component

y: S-parameters of the component

Q.J. Zhang, Carleton University

Data Generation:

(a)Generate (x,y) samples: ( xk , yk ) , k = 1, 2, …, P, such that

the finished NN best (accurately) represent original x~y
problem

(b)Data generator
• Measurement : for each given xk , measure values of yk ,
k=1,2,…, p
• Simulation: for each given xk , use a simulator to
calculate yk , k=1,2,…, p

Q.J. Zhang, Carleton University

Comparison of Neural Network Based Microwave Model Development
Using Data from Two Types of Data Generators
Basis of Comparison Neural Model Development Using Neural Model Development Using

Measurement Data Simulation Data

Availability of Problem Model can be developed even if Model can be developed only for

Theory-Equations the theory-equations are not the problems that have theory that

known, or difficult to implement in is implemented in a simulator.

CAD.

Assumptions No assumptions are involved and Often involves assumptions and the

the model could include all the model will be limited by the

effects, e.g., 3D-fullwave effects, assumptions made by the simulator,

fringing effects etc. e.g., 2.5D EM.

Input Parameter Sweep Data generation could either be Relatively easier to sweep any

expensive or infeasible, if a parameter in the simulator, because

geometrical parameter, e.g., the changes are numerical and not

transistor gate-length needs to be physical/manual.

sampled/changed.

Q.J. Zhang, Carleton University

Comparison of Neural Network Based Microwave Model Development
Using Data from Two Types of Data Generators (continued)

Basis of Comparison Neural Model Development Using Neural Model Development Using

Measurement Data Simulation Data

Sources of Small and Equipment limitations and Accuracy limitations and non-

Large/Gross Errors tolerances. convergence of simulations.

Feasibility of Getting Development of models is possible Any response can be modeled as

Desired Output for measurable responses only. For long as it can be computed by the

example, drain charge of an FET simulator.

may not be easy to measure.

Q.J. Zhang, Carleton University

Data Generation:
(c) Range of x to be sampled

• For Testing Data and Validation data:

xmin  xmax : Should represent the user-intended range
in which the NN is to be used by user

• For Training Data:

Default range of x samples should be equal to user-
intended range, or if feasible, slightly beyond the user
intended range

Q.J. Zhang, Carleton University

Data Generation
- where data should be sampled
x3

(d) Distribution of x samples

• Uniform grid distribution

• Non-uniform grid distribution
• Design of Experiments (DOE) methodology
central-composite design
2n factorial design
• Star distribution
• Random distribution
Q.J. Zhang, Carleton University
Data Generation (continued):

(e) Number of samples P -- Theoretical factor:

•For grid distribution case: Shannon’s Theorem
•For random distribution case: statistical confidence

Q.J. Zhang, Carleton University

Input / Output Scaling
The orders of magnitude of various x and d values in
microwave applications can be very different from one
another.

Scaling of training data is desirable for efficient neural

network training

The data can be scaled such that various x (or d ) have

similar order of magnitude

Q.J. Zhang, Carleton University

Input / Output Scaling:
Notation:
x and y -- Original x and y
~
x and ~y -- Scaled x and y
xm ax, xm in -- Obtained from data
x~max , x~min -- Dictated by NN trainer

• Linear scale
x − xmin ~
Scale formula: x~ = x~min + ( xmax − x~min )
xmax − xmin
~
x −~
xmin
De-scaled formula: x = x + ~ ~ ( xmax − xmin )
x −x
min
max min

• Log scale
Scale formula: ~ = ln( x − x )
x min
x~
De-scale formula: x = xmin + e

Q.J. Zhang, Carleton University

Illustration of Data Scaling
y
d
De-scaling
Scaling

Training data
Neural network Trained Neural network
Scaled
model
Data

Scaling

x x
x

Data Data Neural network Finished model

generation scaling training for user

Q.J. Zhang, Carleton University

Divide Data into Training Set,
Validation Set and Testing Set
Notation:
P – total number of data samples generated
D – Set for all data, D = {1, 2, …, P}
Tr – Training data set
V -- Validation data set
Te -- Test data set

Ideally: Each data set (Tr , V, Te) should be an adequate

representation of original y = f ( x ) problem in the
entire xmin ~ xmax range. Three sets have no overlap.

Q.J. Zhang, Carleton University

Divide Data into Training Set,
Validation Set and Testing Set

Case 1: When original data is quite sufficient, split D into

non-overlapping sets

Case 2: When data is very limited, duplicate D, such that

Tr = V = Te = D

Case 3: Split data D into 2 sets.

Q.J. Zhang, Carleton University

Training / Validation and Testing
1
 y j ( xk , w ) − d jk
q q
 1 m 
Training error: ETr ( w ) =    
 size(Tr)  m kTr j = 1 ymax j − ymin j 
 

1
 y j ( xk , w ) − d jk
q q
 1 m 
Validation error: EV ( w ) =    
 size(V)  m kV j =1 ymax j − ymin j 
 

1
 y j ( xk , w ) − d jk
q q
 1 m 
Test error: ETe ( w ) =    
 size(Te )  m k Te j =1 ymax j − ymin j 
 

Q.J. Zhang, Carleton University

Where to Use Each Error Criteria

ETr
Training error: The training error ETr(w) and its derivative w
are used to determine how to update w during training

Validation error: The validation error EV(w) is used as a stopping

criteria during training, i.e., to determine if training
is sufficient.

Test error: The test error ETe(w) is used after training has finished
to provide a final assessment of the quality of the trained
neural network. Test error is not involved during training.

Q.J. Zhang, Carleton University

Flow-chart Showing Neural Network Training, Validation and Testing

Assign random initial Perform feedforward Desired

Evaluate accuracy
values for all the weight computation for all
validation error achieved?
parameters samples in validation set

Yes
Select a neural network
structure, e.g., MLP
STOP
Training
Evaluate test error as
START an independent quality Perform feedforward
measure
measure
of for
the ANN
trained computation for all
model
neuralreliability
network samples in test set

Q.J. Zhang, Carleton University

Initial Value of NN Weights Before Training
MLP: small random values

RBF/Wavelet: estimate center & width of RBF

or translation & dilation of Wavelet

Knowledge Based NN: use physical/electrical

experience

Q.J. Zhang, Carleton University

Overlearning
Definition (strict):
 ETr  0
Math  E  E
 V Tr

Observation: NN memorized training data, but can not

generalize well
Possible reasons:
a) Too many hidden neurons
b) Not enough training data
Actions:
a) Add training data
b) Delete hidden neurons
c) Backup/retrieve previous solution
Q.J. Zhang, Carleton University
Neural Network Over-Learning
output (y)
1.2

0.8

0.4

0.0
Neural network

-0.4 Training data

Validation data
-0.8
-5.0 0.0 5.0 10.0 15.0
input (x)

Q.J. Zhang, Carleton University

Underlearning
Definition (strict):
Math ETr  0

Observation: NN can not even represent the problem at

training points

Possible reasons:
a) Not enough hidden neurons
b) Training stuck at local solution
c) Not enough training

Actions:
a) Add hidden neurons
b) More training
c) Perturb solution, then train
Q.J. Zhang, Carleton University
Neural Network Under-Learning
Output (y)
1.5

0.5

0.0
Neural network

-0.5 Training data

Validation data
-1
-5.0 0.0 5.0 10.0 15.0
Input (x)

Q.J. Zhang, Carleton University

Perfect Learning:

Definition (strict):
Math ETr  EV  0

Observation: generalized well

Q.J. Zhang, Carleton University

Perfect Learning of Neural Networks
output (y)
1.2

0.8
.
0.4
.
.
.
0.0
Neural network
.
-0.4 Training data
. Validation data
-0.8
-5 0 5 10 15
input (x)

Q.J. Zhang, Carleton University

Types of Training

• Sample-by-sample (or online) training: ANN weights are updated

every time a training sample is presented to the network, i.e., weight
update is based on training error from that sample

• Batch-mode (or offline) training: ANN weights are updated after each
epoch, i.e., weight update is based on training error from all the
samples in training data set

where an epoch is defined as a stage of ANN training that

involves presentation of all the samples in the training data set to
the neural network once for the purpose of learning

• Supervised training: using (x & y) data for training

• Un-supervised training: using only x data for training

Q.J. Zhang, Carleton University

Neural Network Training

x
Q.J. Zhang, Carleton University
Training Problem Statement:
Given training data (xk ,d k ), k Tr
Validation data (x k , d k ), k  V
NN model y(x, w)
Find values of w, such that validation error is minimized
min EV ( epoch)
epoch
1
where  y j ( xk , w (epoch))− d jk
q q
Δ 1 m 
EV (epoch)=    
 VP  m kV j =1 ymax j − ymin j 
 
PV = size(V )
w(epoch) = w(epoch − 1) +  w(epoch − 1)
w |epoch=0 = user’s / software initial guess

 w(epoch − 1) is the update determined by the optimization

algorithm (training algorithm) which minimizes the training
error
Q.J. Zhang, Carleton University
Steps of Gradient Based Training Algorithms:
Step 1: w = initial guess
epoch = 0

Step 2: If EV (epoch)   (given accuracy criteria)

or epoch > max_epoch (max number of epochs),
stop
ETr ( w)
Step 3: Calculate ETr (w) and using partial or all
w
training Data

Step 4: Use optimization algorithm to find w

w  w+  w

Step 5: If all training data are used, then

epoch = epoch + 1 , go to Step 2, else go to Step 3

Q.J. Zhang, Carleton University

Update w in Gradient-based Methods

w = h h

where h is the direction of the update of w

h is the step size of the update of w
ETr ( w)
Gradient based methods use information of ETr (w) and
w
to determine update direction of w .

Step size h is determined by:

• Small fixed constant set by user
• Adaptive constant during training
• Line minimization method to find best value of h

Q.J. Zhang, Carleton University

Line Minimization Problem Statement
Let a scalar function of one variable be defined as
f(h) = ETr(w + h h)
Given present value of w and direction h
Find h such that f(h) is minimized.

Solution method: (1-dimensional optimization method):

 Golden section method
Sectioning methods Fibonacci method
 
 Bisection method




 Quadratic method
Interpolation methods Cubic method
 

Q.J. Zhang, Carleton University
Back-Propagation (BP), (Rumelhart, Hinton
& Williams, 1986)
ETr ( w)
We use the negative gradient direction: h = -
w

for  w = h h

• STC (Darkens, 1992)

c epoch
1+( ) ( )
h0 
h = h0
c epoch epoch
1+( ) ( ) +  2( )2
h0  
where h0 , , c are user defined

• Delta-bar-delta (Jacobs, 1988) ETr ( w )

(a) A hi for each weight wi in w of NN, wi = wi − hi
w
(b) h is adjusted during training using present and previous
information of ETr ( w)
wi

Q.J. Zhang, Carleton University

Concept of Contour Plots
To illustrate the process of how the w vector change, we can use
contour plots.
Simple examples of contour plots with 2 variables w=[w1 w2]:

ETr(w)=(w1 -1)2 + (w2 -2)2 ETr(w)=4(w1 -1)2 + (w2 -2)2 ETr(w)=(1.73(w1 -1)-(w2 -2))2 +
0.25*((w1 -1)+1.73(w2 -2))2
w2 w2 w2

w1 w1 w1

Arrows show direction of gradient vector ∂ETr / ∂w

The gradient vector is always perpendicular to the contour
For BP, w will move along negative direction of the gradient Q.J. Zhang, Carleton University
Conjugate Gradient Method
w = h h
ETr ( w )
Let E =
w
Then h( epoch) = −E +  h( epoch− 1 )
where h0 = −E
E ( epoch ) 2
 = (Fletcher/Reeves)
E ( epoch − 1 ) 2

( E ( epoch ) − E ( epoch − 1 ))T E ( epoch ) (Polak – Ribiere)

 =
E ( epoch − 1 ) 2
 Line minimization method
h is determined by Trust Region method


Speed: generally fast than BP

Memory: A few vectors of NW long, where NW is the total # of NN
weights/parameters in w
Q.J. Zhang, Carleton University
Illustration of Conjugate Direction
Simple examples of contour plots with 2 variables w=[w1 w2]:
w2

Location of w

Gradient
direction

Negative
gradient
direction

Conjugate
gradient
direction
w1

Q.J. Zhang, Carleton University

Quasi-Newton Method
Let H be the Hessian matrix of ETr w.r.t. w
B be the inverse of H
Weight update: w = h h, where h = − B E

Use information of w and g to approximate B:

B epoch=0 = I
 w wT B epoch−1 gg T B epoch− 1
B epoch = B epoch−1 + −
w g
T
g T B epoch−1 g
(DFP formula)
where w = w(epoch) – w(epoch-1)
 g =  E ( epoch) −  E ( epoch − 1)

Speed: fast
2
Main Memory Needed: N W (Large)

Q.J. Zhang, Carleton University

Levenberg-Marquardt Method
Obtain  w from solving linear equations, e.g.
− (J T J +  I )  w = J T e
y j ( xk , w ) − d jk
where e = [ e1 , e2 , . . . , eNe ]T , ei =  =
ymax, j − ymin, j
jk

T
e
J is Jacobian, J = ( )T
w
 0 Typical Levenberg Marquardt

= 0 Gauss Newton

This method is good if e can be very small, e.g., small residue

problems.
Computation needs LU decomposition
2
Main Memory Needed: N W , (Large)
Q.J. Zhang, Carleton University
Other Training Methods
Huber-Quasi-Newton

Similar to Quasi-Newton, except that the error function

for training is based on Huber function, and not the
conventional least square error function.

The Huber formulation allows the training algorithm

to robustly handle both small random errors
and accidental large error in training data.

Q.J. Zhang, Carleton University

Other Training Methods (continued)
Simplex Method using information ETr (w ) only

The method starts with several initial guesses of w. These

initial points form a simplex in the w space.

The method then iteratively updates the simplex using basic

moves such as reflection, expansion, and contraction, according
to the information of ETr (w ) at vertices of the simplex

The error ETr (w ) generally decreases as the simplex is updated.

Q.J. Zhang, Carleton University

Other Training Methods (continued)
Genetic Algorithm using information ETr(w) only, searching
for global minimum

The algorithm starts with several initial points of w

called a population of w

A fitness value is defined for each w such that w with

lower error ETr(w) has a high fitness

w points with high fitness values are more likely selected

as parents from whom new points of w called offspring
are produced

This process continues, and fitness among the population

improves
Q.J. Zhang, Carleton University
Other Methods (continued)
Particle Swarm Optimization (PSO) using information ETr(w) only,
searching for global minimum

The algorithm starts with several initial points of w. Each point

of w is called a particle, and all the particles together is
called a swarm of w.

Let pb represent the historical best of a particle w. Let gb represent

the historical best of all particles w in the swarm.

Let v be defined as the velocity of a particle w, computed as:

v = c0 vold + c1r1 (pb - w) + c2r2 (gb - w)
where c0, c1 and c2 are constant weight parameters, r1 and r2
are random values between 0 and 1, and vold represent the v of the particle
in the previous iteration.

Each particle is updated by w = w+ v

Q.J. Zhang, Carleton University
Qualitative Comparison of Different Algorithms

Convergence
Rate
fast more
Levenberg Marquardt (for small-residue problems)
Quasi-Newton
Conjugate Gradient
BP
slow
less

Memory need and

effort in implementation

Q.J. Zhang, Carleton University

Comparison of Training Algorithms for 3-Conductor Microstrip Line
Example (5 input neurons, 28 hidden neurons, 5 output neurons)

Training Algorithm No. of Epochs Training Error (%) Avg. Test Error (%) CPU (in Sec)

Adaptive

Backpropagation 10755 0.224 0.252 13724

Conjugate-gradient

2169 0.415 0.473 5511

Quasi-Newton

1007 0.227 0.242 2034

Levenberg-

Marquardt 20 0.276 0.294 1453

Q.J. Zhang, Carleton University

Comparison of Training Algorithms for MESFET Example
(4 input neurons, 60 hidden neurons, 8 output neurons)

Training Algorithm No of Epochs Training Error (%) Ave. Test Error (%) CPU (in Sec)

Adaptive

Backpropagation 15319 0.98 1.04 11245

Conjugate Gradient

1605 0.99 1.04 4391

Quasi-Newton 570 0.88 0.89 1574

Levenberg-

Marquardt 12 0.97 1.03 4322

Q.J. Zhang, Carleton University

Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
67% (3)
Seat Leon (1P, 1P0,1P1) Workshop - Electrical System
365 pages
Chapter 1. Introduction To Neural Network
100% (1)
Chapter 1. Introduction To Neural Network
34 pages
Study of BJT Biasing Circuit - Fixed Bias and Self-Bias Circuits.
No ratings yet
Study of BJT Biasing Circuit - Fixed Bias and Self-Bias Circuits.
6 pages
Lecture 04 Interconnects 28092023
No ratings yet
Lecture 04 Interconnects 28092023
37 pages
Training Deep Neural Networks
No ratings yet
Training Deep Neural Networks
55 pages
Introduction To Semiconductor Materials and Crystal Structures
No ratings yet
Introduction To Semiconductor Materials and Crystal Structures
27 pages
EE2003 Semiconductor Fundamentals - OBTL
No ratings yet
EE2003 Semiconductor Fundamentals - OBTL
6 pages
Experiment No. 05: Simulate I-V Characteristics of PN and Zener Diode Using Ltspice
No ratings yet
Experiment No. 05: Simulate I-V Characteristics of PN and Zener Diode Using Ltspice
10 pages
ML Terminologies PDF
100% (1)
ML Terminologies PDF
44 pages
Machine Learning in New
No ratings yet
Machine Learning in New
13 pages
Apb-Uart Vip
No ratings yet
Apb-Uart Vip
10 pages
Carrier Mobility in Semiconductors
No ratings yet
Carrier Mobility in Semiconductors
2 pages
2011 International Technology Roadmap of R Semiconductor
No ratings yet
2011 International Technology Roadmap of R Semiconductor
80 pages
Spouses Cha Vs CA GR No. 124520
No ratings yet
Spouses Cha Vs CA GR No. 124520
2 pages
Machine Learning For Intelligent System
100% (1)
Machine Learning For Intelligent System
3 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Lab5 - Logic Gates Simulation Using LTSpice
No ratings yet
Lab5 - Logic Gates Simulation Using LTSpice
7 pages
Chapter 11 Representation and Description
100% (1)
Chapter 11 Representation and Description
111 pages
Service Diesel Motorpal World India: Explanations
No ratings yet
Service Diesel Motorpal World India: Explanations
13 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
Time Series Forecasting With Python Cheat Sheet
No ratings yet
Time Series Forecasting With Python Cheat Sheet
7 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
The Role of Chittagong Port in The Economy of Bangladesh II
100% (2)
The Role of Chittagong Port in The Economy of Bangladesh II
15 pages
GIKI ElectEngg FEE SAR V1.9 PDF
No ratings yet
GIKI ElectEngg FEE SAR V1.9 PDF
226 pages
Chapter - 14 Semiconductor Electronics Materials, Devices and Simple Circuits
No ratings yet
Chapter - 14 Semiconductor Electronics Materials, Devices and Simple Circuits
9 pages
Signals, Linear Systems, and Convolution
No ratings yet
Signals, Linear Systems, and Convolution
18 pages
Ee5311 Module 6 Adder Mult
No ratings yet
Ee5311 Module 6 Adder Mult
58 pages
Computerized System Validation
No ratings yet
Computerized System Validation
14 pages
Ec8452 Ec-Ii Unit-3
100% (1)
Ec8452 Ec-Ii Unit-3
69 pages
Home Automation System Using Bluetooth Control Report
No ratings yet
Home Automation System Using Bluetooth Control Report
6 pages
Python Programming Notes - UNIT-I
No ratings yet
Python Programming Notes - UNIT-I
69 pages
Invitation Letter For Visa Spouse
No ratings yet
Invitation Letter For Visa Spouse
2 pages
CH - 14 Semiconductor Electronics
No ratings yet
CH - 14 Semiconductor Electronics
33 pages
Design A 4×1 Multiplexer Using Pass Transistor Logic in Schematic and Simulate For Transient Characteristics.
100% (1)
Design A 4×1 Multiplexer Using Pass Transistor Logic in Schematic and Simulate For Transient Characteristics.
6 pages
Syllabus
No ratings yet
Syllabus
7 pages
Chapter 5. Probability and Random Process - Updated
No ratings yet
Chapter 5. Probability and Random Process - Updated
151 pages
Chapter 2 Handouts
No ratings yet
Chapter 2 Handouts
31 pages
CNN-Based Network Intrusion Detection Against Denial-of-Service Attacks
100% (1)
CNN-Based Network Intrusion Detection Against Denial-of-Service Attacks
21 pages
Back Propagation Technique
No ratings yet
Back Propagation Technique
24 pages
Redmond Catalogo
No ratings yet
Redmond Catalogo
242 pages
Artifical Intelligence Coursework Report
No ratings yet
Artifical Intelligence Coursework Report
28 pages
EC462 Mixed Signal Circuit Design
No ratings yet
EC462 Mixed Signal Circuit Design
2 pages
Network Analysis and Synthesis: Subject Code EC203 Credits: 3 Total Hours: 42
No ratings yet
Network Analysis and Synthesis: Subject Code EC203 Credits: 3 Total Hours: 42
1 page
Master Thesis Template Polito
No ratings yet
Master Thesis Template Polito
16 pages
Intellect OCR To SAP FB60 Integration Proposal
No ratings yet
Intellect OCR To SAP FB60 Integration Proposal
2 pages
Machine Learning: Neural Networks
No ratings yet
Machine Learning: Neural Networks
22 pages
The Tipping Problem
No ratings yet
The Tipping Problem
26 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
A Study ON "Training and Development"
No ratings yet
A Study ON "Training and Development"
83 pages
Stochastic Process
No ratings yet
Stochastic Process
19 pages
Employees Survival Guide To Change The Complete Guide To Surviving and Thriving During Organizational Change Jeffrey M Hiatt Instant Download
No ratings yet
Employees Survival Guide To Change The Complete Guide To Surviving and Thriving During Organizational Change Jeffrey M Hiatt Instant Download
37 pages
L10 - Intro - To - Deep - Learning
No ratings yet
L10 - Intro - To - Deep - Learning
75 pages
Ludo Game Report LP
No ratings yet
Ludo Game Report LP
15 pages
MID Spring 2017
No ratings yet
MID Spring 2017
2 pages
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
No ratings yet
Speech-Emotion-Recognition Using SVM, Decision Tree and LDA Report
7 pages
R&S ESW User Manual en 01
No ratings yet
R&S ESW User Manual en 01
828 pages
Aiml Manual 6th Sem
No ratings yet
Aiml Manual 6th Sem
15 pages
6-A Prediction Problem
No ratings yet
6-A Prediction Problem
31 pages
Various Neural Network Architect Assignment Questions
No ratings yet
Various Neural Network Architect Assignment Questions
9 pages
Practice Final sp22
No ratings yet
Practice Final sp22
10 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Development of Production and Product Safety EN PDF
No ratings yet
Development of Production and Product Safety EN PDF
19 pages
Property Management Presentation
100% (1)
Property Management Presentation
14 pages
IEC Lab Exp-7
No ratings yet
IEC Lab Exp-7
12 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Intro4 ANN Deep CNN PDF
No ratings yet
Intro4 ANN Deep CNN PDF
20 pages
Cta Cli
No ratings yet
Cta Cli
52 pages
Text
No ratings yet
Text
131 pages
Lecture 10
No ratings yet
Lecture 10
14 pages
EGEE 203L - Experiment 5 Chris
No ratings yet
EGEE 203L - Experiment 5 Chris
12 pages
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
No ratings yet
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
145 pages
Neural and Fuzzy Logic
No ratings yet
Neural and Fuzzy Logic
8 pages
Technical Supply Conditions For Gauges: 1 IS: 7018 (Part 2) - 1983 Indian Standard
No ratings yet
Technical Supply Conditions For Gauges: 1 IS: 7018 (Part 2) - 1983 Indian Standard
6 pages
A Convolutional Neural Network For Network Intrusion Detection System
No ratings yet
A Convolutional Neural Network For Network Intrusion Detection System
6 pages
Simple Additive Weighting Method To Determining Employee Salary Increase Rate
No ratings yet
Simple Additive Weighting Method To Determining Employee Salary Increase Rate
7 pages
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
No ratings yet
Assoland Construction Pte LTD V Malayan Credit Properties Pte LTD (1993) 3 SLR 470
2 pages
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
No ratings yet
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
4 pages
Introduction To Modern Theory of Polarization
No ratings yet
Introduction To Modern Theory of Polarization
12 pages
Assignment #2: BIOM5101F21
No ratings yet
Assignment #2: BIOM5101F21
4 pages
LuNet A Deep Neural Network For Network Intrusion Detection
No ratings yet
LuNet A Deep Neural Network For Network Intrusion Detection
8 pages
Assignment 1 Student Name: Aditi Biswas & Student ID: 101193708
No ratings yet
Assignment 1 Student Name: Aditi Biswas & Student ID: 101193708
11 pages
Machine Learning Techniques For Classifying Network Anomalies and Intrusions Revised
No ratings yet
Machine Learning Techniques For Classifying Network Anomalies and Intrusions Revised
5 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Aminu Final Draft-1
No ratings yet
Aminu Final Draft-1
86 pages
Testing&Testabilty Course Plan
No ratings yet
Testing&Testabilty Course Plan
5 pages
Semiconductor Physics
No ratings yet
Semiconductor Physics
23 pages
Nikhil Sharma Resume
No ratings yet
Nikhil Sharma Resume
6 pages
Electrical Engineering - Caltech Catalog
No ratings yet
Electrical Engineering - Caltech Catalog
9 pages
ANN Based Modeling High Speed IC Interconnects
No ratings yet
ANN Based Modeling High Speed IC Interconnects
24 pages
LI 2024 Invitation - Online
No ratings yet
LI 2024 Invitation - Online
15 pages
Final Training Design
No ratings yet
Final Training Design
4 pages
Ec1to6 PDF
No ratings yet
Ec1to6 PDF
61 pages
Maths Mount Olives.
No ratings yet
Maths Mount Olives.
16 pages
Maths PP2 Marking Scheme - Docx Form 3 End Term 3 Excellence
No ratings yet
Maths PP2 Marking Scheme - Docx Form 3 End Term 3 Excellence
9 pages
Resume 1
No ratings yet
Resume 1
1 page
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Experiment No.02 To Observe The Characteristics of A Half Wave Rectifier
No ratings yet
Experiment No.02 To Observe The Characteristics of A Half Wave Rectifier
4 pages
Csas Allocation (Dufresher24)
No ratings yet
Csas Allocation (Dufresher24)
4 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Training of Neural Networks: Q.J. Zhang, Carleton University

Uploaded by

Training of Neural Networks: Q.J. Zhang, Carleton University

Uploaded by

Training of Neural Networks

Q.J. Zhang, Carleton University

y: output of the original modeling problem or the neural network

w: internal weights/parameters of the neural network

m: number of outputs of the model

y = f(x , w) : neural network model

d: data for y (e.g., trainining data)

Q.J. Zhang, Carleton University

Define model input-output (x, y), for example,

x: physical/geometrical parameters of the component

Q.J. Zhang, Carleton University

(a)Generate (x,y) samples: ( xk , yk ) , k = 1, 2, …, P, such that

Q.J. Zhang, Carleton University

Measurement Data Simulation Data

known, or difficult to implement in is implemented in a simulator.

effects, e.g., 3D-fullwave effects, assumptions made by the simulator,

fringing effects etc. e.g., 2.5D EM.

expensive or infeasible, if a parameter in the simulator, because

geometrical parameter, e.g., the changes are numerical and not

transistor gate-length needs to be physical/manual.

Q.J. Zhang, Carleton University

Measurement Data Simulation Data

Large/Gross Errors tolerances. convergence of simulations.

Feasibility of Getting Development of models is possible Any response can be modeled as

example, drain charge of an FET simulator.

may not be easy to measure.

Q.J. Zhang, Carleton University

• For Testing Data and Validation data:

• For Training Data:

Q.J. Zhang, Carleton University

(d) Distribution of x samples

• Uniform grid distribution

(e) Number of samples P -- Theoretical factor:

Q.J. Zhang, Carleton University

Scaling of training data is desirable for efficient neural

The data can be scaled such that various x (or d ) have

Q.J. Zhang, Carleton University

Q.J. Zhang, Carleton University

Data Data Neural network Finished model

Q.J. Zhang, Carleton University

Ideally: Each data set (Tr , V, Te) should be an adequate

Q.J. Zhang, Carleton University

Case 1: When original data is quite sufficient, split D into

Case 2: When data is very limited, duplicate D, such that

Case 3: Split data D into 2 sets.

Q.J. Zhang, Carleton University

Q.J. Zhang, Carleton University

Validation error: The validation error EV(w) is used as a stopping

Q.J. Zhang, Carleton University

Update neural network weight Compute derivatives of

Assign random initial Perform feedforward Desired

Q.J. Zhang, Carleton University

RBF/Wavelet: estimate center & width of RBF

Knowledge Based NN: use physical/electrical

Q.J. Zhang, Carleton University

Observation: NN memorized training data, but can not

-0.4 Training data

Q.J. Zhang, Carleton University

Observation: NN can not even represent the problem at

-0.5 Training data

Q.J. Zhang, Carleton University

Observation: generalized well

Q.J. Zhang, Carleton University

Q.J. Zhang, Carleton University

• Sample-by-sample (or online) training: ANN weights are updated

where an epoch is defined as a stage of ANN training that

• Supervised training: using (x & y) data for training

• Un-supervised training: using only x data for training

Q.J. Zhang, Carleton University

The error between training data and neural network outputs

 w(epoch − 1) is the update determined by the optimization

Step 2: If EV (epoch)   (given accuracy criteria)

Step 4: Use optimization algorithm to find w

Step 5: If all training data are used, then

Q.J. Zhang, Carleton University

where h is the direction of the update of w