0% found this document useful (0 votes)

23 views55 pages

Artificial Neural Networks - DL

Uploaded by

enpass

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views55 pages

Artificial Neural Networks - DL

Uploaded by

enpass

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 55

Artificial Neural Networks

- The nuts and bolts of a neural net

Why the rage?
Neuronal structure in Brain
Quick Q: Will it fire?
Input 1 (x1) = 0.6
Input 2 (x2) = 1.0
Weight 1 (w1) = 0.5
Weight 2 (w2) = 0.8

Activation function is a step function with threshold = 1.0

Structure of a network
Demo
http://www.emergentmind.com/neural-network
Size of a network

2 layer ANN 3 layer ANN

Setting number of layers and their sizes
Optimising the weight
Moving across the steepest descent
Tensorflow playground
http://playground.tensorflow.org
Why biases
Degrees of freedom
Why activation function
Parameters vs Hyperparameters
Data split
- Small data set

- Large data set

Regularisation
- L1

- L2

- Dropout
Weight initialisation
- Uniform distribution

- He initialisation

- Xavier initialisation
Learning rate
Hyperparameters
- Epochs

- Batch Size

- Number of neurons

- Number of hidden layers

- Momentum
Training of a neural net
Overview of Learning

1)Model initialisation
2)Forward propagate
3)Loss function
4)Optimising weights
5)Backpropagation
6)Weight update
7)Iteration until convergence
1) Model initialisation

Input Desired output

0 0

1 2

2 4

3 6

4 8
2) Forward propagate

Input Actual output of model 1 (y= 3.x)

0 0

1 3

2 6

3 9

4 12
3) Loss function

Input actual Desired Absolute Error Square Error

0 0 0 0 0

1 3 2 1 1

2 6 4 2 4

3 9 6 3 9

4 12 8 4 16

Total: - - 10 30
4) Optimising the weight
4a) Differentiation

Input Desired Output W=3 Square Error W=3.0001 Square Error

0 0 0 0 0 0

1 2 3 1 3.0001 1.0002

2 4 6 4 6.0002 4.0008

3 6 9 9 9.0003 9.0018

4 8 12 16 12.0004 16.0032

Total: - - 30 - 30.006
4b) Moving across the
5) Backpropagation

dJ/da = dJ/dv x dv/da

dJ/db = dJ/dv x dv/du x du/db
6) Weight update

New weight = old weight - Derivative Rate * learning rate

Learning rate

• If it’s too big you can never converge to the low point. If it’s too small,
then you will take a lot of time to converge. So, we need to maintain a
balance and find an optimum value.

• Now several weight update methods exist. These methods are called
optimisers. The delta rule is the most simple and intuitive one. We call it
the standard gradient descent.
7) Iteration until convergence

Depends on many factors

• Learning rate
• Optimisation method
• Random initialisation
• Quality of training set
Actual step by step example of the math involved
Random initialisation
Forward pass
Matrix representation

GPU vs CPU
Calculating Output
Loss function / Error

Repeating the process for other output

Backward pass
Weight update
Hidden layer

v
Hidden layer weight updates

Finally, we’ve updated all of our weights! When we fed forward the 0.05 and 0.1 inputs originally,
the error on the network was 0.298371109. After this first round of backpropagation, the total error
is now down to 0.291027924. It might not seem like much, but after repeating this process 10,000
times, for example, the error plummets to 0.0000351085. At this point, when we feed forward 0.05
and 0.1, the two outputs neurons generate 0.015912196 (vs 0.01 target) and 0.984065734 (vs 0.99
target)
Simulation

https://www.mladdict.com/linear-regression-simulator
https://www.mladdict.com/neural-network-simulator
Effect of Batch size

• Updating the parameters using all training data is not efficient. You can
update the parameters several times if you only use part of the whole
data.

• On the other hand, updating by one single sample (online updating) is

noisy if the sample is not a good representation of the whole data. You
can consider a mini-batch to ba an approximation of the whole database
Activation functions

• Sigmoid
• Tanh
• Relu (Rectified linear Unit)
Sigmoid
Tanh
Vanishing gradient

If your weight matrix W is initialized too large, the output of the matrix multiply could have a very
large range (e.g. numbers between -400 and 400), which will make all outputs in the vector z
almost binary: either 1 or 0. But if that is the case, z*(1-z), which is local gradient of the sigmoid
nonlinearity, will in both cases become zero (“vanish”), making the gradient for both x and W be
zero. The rest of the backward pass will come out all zero from this point on due to multiplication in
the chain rule.Same is the case for tanh as well, as it is just a scaled up version of sigmoid.
Relu

The Rectified Linear Unit has become very popular in the

last few years. It computes the function
f(x)=max(0,x)f(x)=max(0,x). In other words, the activation
is simply thresholded at zero (see image above on the left).
There are several pros and cons to using the ReLUs:
● It was found to greatly accelerate (e.g. a factor of 6
in Krizhevsky et al.) the convergence of stochastic
gradient descent compared to the sigmoid/tanh
functions. It is argued that this is due to its linear,
non-saturating form.
● Compared to tanh/sigmoid neurons that involve
expensive operations (exponentials, etc.), the ReLU
can be implemented by simply thresholding a matrix
of activations at zero
Dying ReLu

Unfortunately, ReLU units can be fragile during training and can “die”. For example, a large gradient
flowing through a ReLU neuron could cause the weights to update in such a way that the neuron will
never activate on any datapoint again. If this happens, then the gradient flowing through the unit will
forever be zero from that point on. That is, the ReLU units can irreversibly die during training It’s like
permanent, irrecoverable brain damage. For example, you may find that as much as 40% of your
network can be “dead” (i.e. neurons that never activate across the entire training dataset) if the
Experiments:Dropout

In each layer of the neural network, the neurons become dependent on each other. Some neurons gain
more influence than others. The dropout layer randomly mutes different neurons. This way each
neuron has to build a distinct contribution to the final output.The second popular method to prevent
overfitting is applying an L1 or L2 regularizer function on each layer.
Experiments: Regularisation

The neural network with regularization functions outperforms the one without them. The regularization
function L2 punishes functions that are too complex. It measures how much each function contributes to
the final output. It then punishes the ones with large coefficients.
Experiments: Batch size

As we see in the result, a large batch size requires fewer cycles but has more accurate training steps.
In comparison, a smaller batch size is more random but take more steps to compensate for it. A large
batch size requires fewer learning steps. But, you need more memory and time to compute each step.
Experiments: Learning Rate

The learning rate is often considered one of the most important parameters due to its impact. It regulates
how to adjust the change in prediction for each learning step. If the learning rate is too high or too low it
might not converge, like the large learning rate above. There is no fixed way of designing neural networks.
A lot of it has to do with experimentation. Look at what others have done by adding layers, and tuning
hyper parameters. If you have access to a lot of computing power, you can create programs to design and
Experiments : Optimiser

As we can see, the adaptive learning-rate methods, i.e. Adagrad, Adadelta, RMSprop, and Adam are
most suitable and provide the best convergence for these scenarios.

06 AIS302 ANN Backpropagation
No ratings yet
06 AIS302 ANN Backpropagation
83 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
ANNs
No ratings yet
ANNs
17 pages
ANN Doc
No ratings yet
ANN Doc
2 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Slides 11
No ratings yet
Slides 11
48 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
A Imprimer 4
No ratings yet
A Imprimer 4
4 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Cours 4
No ratings yet
Cours 4
30 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Artificial Neural NetworkIV
No ratings yet
Artificial Neural NetworkIV
6 pages
Unit 2
No ratings yet
Unit 2
112 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Optimization of Deep Networks
No ratings yet
Optimization of Deep Networks
84 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Learning Algorithm
No ratings yet
Learning Algorithm
100 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
Unit 2
No ratings yet
Unit 2
112 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Deep Learning Turorial PDF
No ratings yet
Deep Learning Turorial PDF
301 pages
Lec 8
No ratings yet
Lec 8
43 pages
Unit 4
No ratings yet
Unit 4
13 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Unit 2 - ML
No ratings yet
Unit 2 - ML
18 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Unit 4
No ratings yet
Unit 4
19 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Probability Neuron Network
No ratings yet
Probability Neuron Network
84 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Neural - Net - Fitting
3 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
How To Code For Quantum Computers
From Everand
How To Code For Quantum Computers
Nivio Dos Santos
No ratings yet
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
From Everand
Flood Fill: Flood Fill: Exploring Computer Vision's Dynamic Terrain
Fouad Sabry
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
m3 Notes
No ratings yet
m3 Notes
24 pages
Encoding and Decoding of Touch-Tone Signals
67% (6)
Encoding and Decoding of Touch-Tone Signals
13 pages
ML Unit 01
No ratings yet
ML Unit 01
4 pages
Cognizant Exam Questions For Cluster 1, 2 and 3
No ratings yet
Cognizant Exam Questions For Cluster 1, 2 and 3
40 pages
Clustering
No ratings yet
Clustering
3 pages
Unit 5
No ratings yet
Unit 5
39 pages
7 - Lecture 7 - Pulse Transfer Function & ZOH Transfer Function - (2nd Term 2021-2022)
No ratings yet
7 - Lecture 7 - Pulse Transfer Function & ZOH Transfer Function - (2nd Term 2021-2022)
16 pages
DSP Previous Papers
100% (1)
DSP Previous Papers
8 pages
Cmfe Termproj
No ratings yet
Cmfe Termproj
7 pages
Lecture - 3
No ratings yet
Lecture - 3
18 pages
Untitled
No ratings yet
Untitled
4 pages
6 The Power Method
No ratings yet
6 The Power Method
6 pages
1.4 Barcode Label Specification: 1.4.1 Specifications
No ratings yet
1.4 Barcode Label Specification: 1.4.1 Specifications
4 pages
Clustering An African Hairstyle Dataset Using Pca and K-Means
No ratings yet
Clustering An African Hairstyle Dataset Using Pca and K-Means
11 pages
Assignment 6 Ans
No ratings yet
Assignment 6 Ans
10 pages
13 Applications of trees (Дрва - вежби)
No ratings yet
13 Applications of trees (Дрва - вежби)
15 pages
Two Phase Method
No ratings yet
Two Phase Method
8 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Combinatorial Optimization: Sheet 1
No ratings yet
Combinatorial Optimization: Sheet 1
5 pages
Toolbox - Global Optimization PDF
No ratings yet
Toolbox - Global Optimization PDF
724 pages
Error Handling
No ratings yet
Error Handling
4 pages
DTFT Table PDF
100% (1)
DTFT Table PDF
2 pages
Data Structure File MPCT
No ratings yet
Data Structure File MPCT
37 pages
The Fundamental Concepts of Kalman Filters: Marius Pesavento
No ratings yet
The Fundamental Concepts of Kalman Filters: Marius Pesavento
22 pages
Numerical Method For Extracting An Arc Length Parameterization From
No ratings yet
Numerical Method For Extracting An Arc Length Parameterization From
3 pages
Assignment 1 1
No ratings yet
Assignment 1 1
3 pages
Class 8 - Maths - CH 9 - Algebraic Expressions - Ex 9.1 and Ex. 9.2
No ratings yet
Class 8 - Maths - CH 9 - Algebraic Expressions - Ex 9.1 and Ex. 9.2
6 pages
Schedule For Online Training On Computer Vision in Fisheries Using Machine Learning
No ratings yet
Schedule For Online Training On Computer Vision in Fisheries Using Machine Learning
1 page
Mta Mock
No ratings yet
Mta Mock
5 pages
Day 11 Systems of Linear Equations - Word Problems Ak
No ratings yet
Day 11 Systems of Linear Equations - Word Problems Ak
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Artificial Neural Networks - DL

Uploaded by

Artificial Neural Networks - DL

Uploaded by

Artificial Neural Networks

- The nuts and bolts of a neural net

Activation function is a step function with threshold = 1.0

2 layer ANN 3 layer ANN

- Large data set

- Number of hidden layers

Input Desired output

Input Actual output of model 1 (y= 3.x)

Input actual Desired Absolute Error Square Error

Input Desired Output W=3 Square Error W=3.0001 Square Error

dJ/da = dJ/dv x dv/da

New weight = old weight - Derivative Rate * learning rate

Depends on many factors

Repeating the process for other output

• On the other hand, updating by one single sample (online updating) is

The Rectified Linear Unit has become very popular in the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.