0% found this document useful (0 votes)

67 views66 pages

Neural Network - Optimization DRAFT 3.11

Uploaded by

Devisri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views66 pages

Neural Network - Optimization DRAFT 3.11

Uploaded by

Devisri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 66

Seminar on

NEURAL NETWORKS BASED OPTIMIZATIONS

School of Electronics Engineering (SENSE)
Vellore Institute of Technology, Vellore

Presented by:
Mahesh S (24PHD0300)
Peter A (24PHD0345)
Ronald Vincent (24PHD0368)
Gaurav Gadhiya (24PHD0377)
Devisri Eriki (24PHD0325)
Neural Networks
 A method of computing, based on the interaction of multiple
connected processing elements.
 A powerful technique to solve many real-world problems.
 The ability to learn from experience in order to improve their
performance.
 Ability to deal with incomplete information
 An Artificial neural network (ANN) may be defined as an
information-processing model that is inspired by the way biological
nervous systems, such as the brain, process information.
 This model tries to replicate only the most basic functions of the
brain.
 An ANN is composed of a large number of highly interconnected
processing units (neurons) working in unison to solve
specific problems.
Basics Of Neural Network
• Biological approach to AI
• Developed in 1943
• Comprised of one or more layers of neurons
• Several types, we'll focus on feed-forward and feedback networks
Neurons

Artificial
Biological
 Each neuron is connected with the other by a connection link.
 Each connection link is associated with weights which contain information
about the input signal.
 This information is used by neuron network to solve a particular problem.
 ANN’s collective behavior is characterized by their ability to learn, recall and
generalize training patterns or data.
 Similar to that of human brain.
 They have the capability to model networks of original neurons as found in the
brain.
 Thus, the ANN processing elements are called neuron or artificial neuron.
Neural Network Neurons
• Receives n-inputs
• Multiplies each input by its weight
• Applies activation function to the sum of results
• Outputs result
Activation function

 An activation function in a neural network

defines how the weighted sum of the input is
transformed into an out put from a Node or
nodes in a layer of the network.
 Activation function is a function that introduce
non linearity into the model.
Linear and Non-linear function

 Linear function causes vanishing gradient problems.

 Non linear function help you finding complex relationship between variables.
Neural network applications
 Pattern recognition

 Investment analysis

 Control system and monitoring

 Mobile computing

 Marketing and financial applications

 Forecasting- sales, market, research, meteorology

Structure of Neural Network

X1
W1

W2
X2 Σ Y

W3
X3
Activation Function
• An activation function in a neural network defines how the weighted sum of the input is transformed into an out
put from a Node or nodes in a layer of the network.

• They transform he input signal into a non-linear output, allowing the network to learn complex relationships
between inputs and outputs.

• Activation function determine if a neuron should activate based on the input it receives.

• A sigmoid function, also known as a logistic function, is a mathematical function that outputs a number between
0 and 1. and is commonly used in machine learning, especially in neural networks, as an activation function with
the given formula:

σ(x) =

 It is used to model the probability of an event occurring.

Activation Function

Fig2: Sigmoid function graph

Example of Neural Network

0.1
0.3 0.2

0.2 0.4 Tq= 0.8

0.1

0.3 0.3
 Given, Input pattern
x  00 .. 13 
h 
 
 Target Tq  0.8

 Weights between input and hidden layer

w  0.1 0.4
 0.2 0.3
hp. j 
 
 Weights between output and hidden layer

w  00. .32 
pq.k 
 
 [ w h p . j ]T x h
I p. j

0 . 1 2   00 .. 1
00 .. 3 3 

 0 . 4   

0 . 0 5 
 
0.15

 1 

 I   

 1 exp 0 . 0 5
   0.5125 
pj pj
 1  0.5374
 

 1  e x p 0.15
 

I q . k  [ w p q . k ] T  pj

  0.2 00 .. 55 31 72 4
5  0.2637
0.3  

Output  qk
I qk  

 1
1  exp 0.2637
 

 0.5655

The squared error signal

T q  q k  2
  0.8  0.5655  2  0.0550
Modification of weight between output and hidden
layer
Let   1,   0 . 6

= 2  Tq 
 pq.k   q.k   q.k 1   q.k 
 2 0.8  0.5655 0.56551  0.5655

 0.1152

 2q
w pq.k      0.0354  
p.q  p.qpq.kp. j
w pq.k   0.0371 

w p q . k ( N  1)  w p q . k ( N )  w p q . k

w pq.k( N  1)   0 . 2    - 0 . 0 3 5 4 
 0 . 3  - 0 . 0 3 7 1

 0.2629
0.1646 
 
Modification of weight between input and hidden layer

r
 2
 q.k  
(1 q.k ) wpq.k   p. j (1  p. j ) xh
w hp. j   (2) ( Tq   q.k )
q1

w pq.k  pq.k
0.2
 0.3 0.1152  0.0346
0.0230
   

Let DD
 w pq.k  pq.k  pj  1   pj 
 0.0230  0.5125  1  0.5125    00.0086
. 0 0 5 7 
  0.0346  0.5374  1  0.5374  
   
w p q . k ( N  1)  w p q . k ( N )  w p q . k

w pq.k( N  1)   0 . 2     0 . 0 3 5 4  
 0 . 3   0.0371 

 0.2629
0.1646 
 
0.3
Let HH  x DDh T
 0.0057 0.0086
0.1 

0 . 0 0 1 7 0.0026

 0.0006 0.0009 

HH
whp. j   hp

  0.0010  0.0015 

  0.0003  0.0005  
whp. j (N  1)  whp. j (N)  whp. j

0.1 0.4  0.0010  0.0015

 
0.2 0.3  0.0003
  0.0005 

 0.9900 0.3985
0.1997  
0.2995
 With the updated weights, error is calculated again. Iterations are carried out till we get the error

less than the tolerance.

 Once weights are adjusted, the network is trained.
 Continue this process until the sum squared error is less than the tolerance value or the
maximum number of iteration is reached.
 In this example we have taken the number of iterations as 100.
 Output = 0.7912 which is closer to the target value 0.8 and the sum squared error is 0.0088.
HOW DO OPTIMIZERS WORK?

• For a useful mental model, you can think of a hiker(she)

trying to get down a mountain with a blindfold on. It’s
impossible to know which direction to go in, but there’s one
thing she can know: if she’s going down (making progress)
or going up (losing progress). Eventually, if she keeps taking
steps that lead her downwards, she’ll reach the base.
DIFFERENT TYPES OF OPTIMIZERS

1.Gradient Descent
2.Stochastic Gradient Descent (SGD)
3.Mini Batch Stochastic Gradient Descent (MB-SGD)
4.SGD with momentum
5.Nesterov Accelerated Gradient (NAG)
6.Adaptive Gradient (AdaGrad)
7.AdaDelta
8.RMSprop
9.Adam
LINEAR REGRESSION

 Imagine, we have the data having heights and weights of thousands of people. We want to use this data to
create a Machine Learning model that takes the height of a person as input and predicts the weight of the
person.

 Linear regression performs the task to predict a dependent variable value (y) based on a given independent
variable (x)). Hence, the name is Linear Regression
 This kind of relationship between the input feature(height) and output feature(weight) can be captured by a
linear regression model that tries to fit a straight line on this data.

 The following is the equation of a line of a simple linear regression model: Y = mx + c

 Y is the output feature (weight), m is the slope of the line, x is the input feature(height) and c is the
intercept(weight is equal to c when height is 0 as shown below). Y = m(0) + c = c
COST FUNCTION – MEAN SQUARE ERROR - 1
 we need is a cost function so we can start optimizing our weights.

 Let’s use MSE(L2) as our cost function. MSE measures the average squared difference between an
observation’s actual and predicted values. The output is a single number representing the cost, or score,
associated with our current set of weights. Our goal is to minimize MSE to improve the accuracy of our
model.
COST FUNCTION – MEAN SQUARE ERROR - 2

Error: Difference between actual and predicted

WHAT IS GRADIENT DESCENT?
 Gradient Descent (GD) is a widely used optimization algorithm in machine learning and deep learning that minimizes
the cost function of a neural network model during training. It is used to find the optimal parameters—weights and
biases—of a neural network by minimizing a defined cost function.

 It works by iteratively adjusting the weights or parameters of the model in the direction of the negative gradient of the
cost function until the minimum of the cost function is reached.

 The cost function evaluates the difference between the actual and predicted outputs.

 It trains machine learning models by minimizing errors between predicted and actual results.
 The goal of gradient descent is to minimize the cost function, or the error between predicted and actual y. In
order to do this, it requires two data points—a direction and a learning rate. These factors determine the partial
derivative calculations of future iterations, allowing it to gradually arrive at the local or global minimum (i.e.
point of convergence).

 The cost function of linear regression(MSE) is a convex function i.e. it has only one minima across the range of
values of slope ‘m’ and constant ‘c’
Learning rate (also referred to as step size or the alpha) is the size of the steps that are taken to reach the
minimum. This is typically a small value, and it is evaluated and updated based on the behavior of the cost
function. High learning rates result in larger steps but risks overshooting the minimum. Conversely, a low
learning rate has small step sizes. While it has the advantage of more precision, the number of iterations
compromises overall efficiency as this takes more time and computations to reach the minimum.

Gradient descent is used to get to the

minimum value of the cost function.
Intuitively, gradient descent finds the
slope of the cost function at every step
and travels down the valley to reach the
lowest point (minimum of the cost
function).
STOCHASTIC GRADIENT DESCENT (SGD):

 For big data ( huge data set ) , gradient descent is slow. Hence, in most scenarios, SGD is preferred over Batch
Gradient Descent for optimizing a learning algorithm. Reach minimum value , with in a short time period ,
since it uses fewer iterations.

 Stochastic Gradient Descent (SGD) is a variant of the gradient descent algorithm that is used for optimizing
machine learning machine learning models. It addresses the computational inefficiency of traditional Gradient
Descent methods when dealing with large datasets in machine learning projects.

 In SGD, instead of using the entire dataset for each iteration, only a single random training example (or a small
batch) is selected to calculate the gradient and update the model parameters.

 This random selection introduces randomness into the optimization process, hence the term “stochastic” in
stochastic Gradient Descent

 The advantage of using SGD is its computational efficiency, especially when dealing with large datasets. By
using a single example or a small batch, the computational cost per iteration is significantly reduced compared
to traditional Gradient Descent methods that require processing the entire dataset.
Stochastic Gradient Descent would
randomly pick one sample for each step . Stochastic Gradient Descent is used
Here the number of terms calculated is when redundant data set is available
reduced by a factor of 3

Height

Weight
Aspect STOCHASTIC GRADIENT DESCENT (SGD) BATCH GRADIENT DESCENT

Uses a single random sample or a small batch of samples at Uses the entire dataset (batch) at each
Dataset Usage
each iteration. Less accurate iteration. More accurate .

Computationally less expensive per iteration, as it processes Computationally more expensive per
Computational Efficiency
fewer data points. iteration, as it processes the entire dataset.

Slower processing & convergence due to less

Convergence Faster processing & convergence due to frequent updates.
frequent updates.

High noise due to frequent updates with a single or few Low noise as it updates parameters using all
Noise in Updates
samples. data points.

More stable as it converges smoothly towards

Stability Less stable as it may oscillate around the optimal solution.
the optimum.

Requires less memory as it processes fewer data points at a Requires more memory to hold the entire
Memory Requirement
time. dataset in memory.

Frequent updates make it suitable for online learning and Less frequent updates make it suitable for
Update Frequency
large datasets. smaller datasets.

Less sensitive to initial parameter values due to frequent

Initialization Sensitivity More sensitive to initial parameter values.
updates.
MINI-BATCH GRADIENT DESCENT

 Mini-batch gradient descent combines concepts from both batch gradient descent and stochastic
gradient descent. It splits the training dataset into small batch sizes and performs updates on each of
those batches. This approach strikes a balance between the computational efficiency of batch gradient
descent and the speed of stochastic gradient descent.

It is more common to select a small

subset of data or mini-batch , for each
step
BATCH GRADIENT DESCENT

 Let’s say there are a total of ‘m’ observations in a data set and we use all these
observations to calculate the cost function J, then this is known as Batch Gradient
Descent

 We have 5 observations, all the 5 observations will be updated at once.

STOCHASTIC GRADIENT DESCENT(SGD)
 Take the first observation, then pass it through the neural network,
calculate the error and then update the parameters

 Then will take the second observation and perform similar steps with it.
This step will be repeated until all observations have been passed through
the network and the parameters have been updated.

 since we have 5 observations, the parameters will be updated 5 times or we

can say that there will be 5 iterations.
MINI-BATCH GRADIENT DESCENT

Assume that the batch size is 2. The last single observation in the final
iteration
IMAGE GRADIENT – IMAGE PROCESSING APPLICATION
 Gradient of an image = Measure of change in Image function, F(x,y) in X ( rows ) and y( column )
Case study :- Medical Science
Person is having Prostate cancer or not
Healthy/Cancer PSA(prostate Specific Healthy/Cancer PSA(prostate Specific
antigen) antigen)
Cancer 3.8 Healthy 2.5

Cancer 3.4 Healthy 2.0

Cancer 2.9 Healthy 1.7

Cancer 2.8 Healthy 1.4

Cancer 2.7 Healthy 1.2

Cancer 2.1 Healthy 0.9

Cancer 1.6 Healthy 0.8

Graphical Representation
Graphical Representation
Active Functions
Training a Neural Network
Training a Neural Network
Evaluation = 12/14*100=86%
Status PSA(prost Cancer Status PSA(prost Cancer
ate ate
Specific Specific
antigen) antigen)
Cancer 3.8 0.991 Healthy 2.5 0.755

Cancer 3.4 0.973 Healthy 2.0 0.438

Cancer 2.9 0.902 Healthy 1.7 0.254

Cancer 2.8 0.875 Healthy 1.4 0.130

Cancer 2.7 0.842 Healthy 1.2 0.079

Cancer 2.1 0.506 Healthy 0.9 0.036

Cancer 1.6 0.206 Healthy 0.8 0.028

Prediction
Where do the values for the weight come from?

For Calculating Bias we generally use ordinary least square

Hidden layers
Graphical Representation
Prediction
Comparison
Applications in Bio Medical Image Processing

• 1. Gradient Descent Application: Primarily used for simpler models or

batch processing tasks in hardware accelerators. It is used when
computational resources are abundant and high accuracy is needed.

• Advantage: Offers stable convergence, suitable for high-resolution

biomedical image analysis where precision is essential, such as in MRI or CT
scan interpretation.

• Limitation: Slow convergence, making it less ideal for real-time applications.

2. Stochastic Gradient Descent (SGD)Application:

• Commonly used in real-time applications where rapid responses are needed,

like edge detection in ultrasound or real-time diagnostics.

• Advantage: Faster than batch gradient descent due to updates after each
sample, allowing more responsive performance on hardware accelerators.

• Limitation: May oscillate due to noisy updates, but this can be managed with
small learning rates or other techniques (like momentum).
3. Mini-Batch Stochastic Gradient Descent (MB-SGD)Application:

• Ideal for moderate-sized biomedical image datasets, like those used in

pathology for cell segmentation or tissue classification.

• Advantage: Balances stability and speed, providing better convergence

compared to SGD while retaining some computational efficiency on
accelerators.

• Limitation: Requires careful tuning of batch size and learning rate, which
can impact memory usage on FPGA or ASIC hardware.
4. SGD with Momentum Application:

• Used in applications with high-dimensional data and complex patterns, such

as 3D medical image reconstruction or deep tissue analysis.

• Advantage: Accelerates convergence by accounting for past gradients,

reducing oscillations in training and achieving faster, smoother results.

• Limitation: Needs additional memory resources for storing the momentum

term, slightly increasing hardware demands.
5. Nesterov Accelerated Gradient (NAG)Application:

• Suitable for biomedical imaging tasks that require high accuracy and smooth
convergence, like tumor boundary identification in MRI scans.

• Advantage: Provides better foresight in parameter updates, improving

convergence speed and potentially enhancing interpretability in high-stakes
diagnostic applications.

• Limitation: Complexity in calculations increases hardware demand, which

might require powerful accelerators.
6. Adaptive Gradient (Ada Grad) Application:
• Effective for sparse datasets and rare event detection, which is common in
biomedical imaging for anomaly detection in complex images, like in identifying
rare cell types.

• Advantage: Adjusts learning rate dynamically for each parameter, improving

performance on sparse and unbalanced data common in medical imaging datasets.

• Limitation: Learning rate decreases over time, potentially causing premature

convergence; can increase computational burden on hardware over prolonged
training.
7. Ada Delta Application: Useful in imaging tasks with varying data
distributions, such as multi-modal imaging analysis where contrast and intensity
vary between modalities (e.g., MRI, PET).

• Advantage: Adaptive to data without requiring a manually set learning rate,

improving robustness in biomedical applications with heterogeneous data.

• Limitation: Requires more memory for storing squared gradients and delta
terms, which can be demanding on hardware resources.
9. Adam (Adaptive Moment Estimation)Application:

• Widely used for deep learning tasks in biomedical image processing, such as convolutional
neural networks for cancer detection in histopathology images or segmentation in
radiology.

• Advantage: Combines the benefits of RMSprop and SGD with momentum, offering fast
convergence and good accuracy, making it ideal for high-performance hardware
applications.

• Limitation: Computationally intensive, requiring more complex logic and memory storage
for moment terms, so it works best on advanced hardware like GPUs or high-capacity
FPGAs.
THANK YOU

Machine Learning?
100% (2)
Machine Learning?
114 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
2-Mathematical Optimization and Deep Learning
No ratings yet
2-Mathematical Optimization and Deep Learning
53 pages
MBZUAI Entry Exam Instructions 2022.01.27
No ratings yet
MBZUAI Entry Exam Instructions 2022.01.27
5 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Linearity: Skip To Content
No ratings yet
Linearity: Skip To Content
10 pages
Neural Network
No ratings yet
Neural Network
97 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
Machine Learning NN
100% (2)
Machine Learning NN
16 pages
Python Unit 5
No ratings yet
Python Unit 5
36 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Wa0006.
No ratings yet
Wa0006.
70 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Bai 1 Eng
No ratings yet
Bai 1 Eng
10 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
Neural Network (Basics)
No ratings yet
Neural Network (Basics)
48 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Neural Network
100% (1)
Neural Network
54 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
UNIT III 3.1 ML Artificial Neural Networks
No ratings yet
UNIT III 3.1 ML Artificial Neural Networks
65 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Unit II
No ratings yet
Unit II
12 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Unit 2
No ratings yet
Unit 2
36 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Machine Learning With Artificial Neural Networks
No ratings yet
Machine Learning With Artificial Neural Networks
44 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Econometrics Jimma 1
No ratings yet
Econometrics Jimma 1
216 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Minsky y Papert
No ratings yet
Minsky y Papert
77 pages
Unit 2
No ratings yet
Unit 2
18 pages
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
No ratings yet
Types of Machine Learning: Supervised Learning: The Computer Is Presented With Example Inputs and Their
50 pages
Dyuthi T1824
No ratings yet
Dyuthi T1824
338 pages
ML Notes
No ratings yet
ML Notes
14 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
DSF Unit 4
No ratings yet
DSF Unit 4
12 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
UChicago Econs Information
No ratings yet
UChicago Econs Information
7 pages
DSUP (AI-DS) Experiments Prem
No ratings yet
DSUP (AI-DS) Experiments Prem
107 pages
CH 15
No ratings yet
CH 15
88 pages
Seston Retention by Whatman Filters: GF/C Glass-Fiber
No ratings yet
Seston Retention by Whatman Filters: GF/C Glass-Fiber
7 pages
(IJCST-V6I4P17) :P T V Lakshmi
No ratings yet
(IJCST-V6I4P17) :P T V Lakshmi
4 pages
Econometrics Notes
No ratings yet
Econometrics Notes
30 pages
Basic Statistical Tools For Research
No ratings yet
Basic Statistical Tools For Research
53 pages
Piling and Parboiling of Shea Nuts
No ratings yet
Piling and Parboiling of Shea Nuts
12 pages
Endogeneity
No ratings yet
Endogeneity
10 pages
CUHK STAT5102 Ch7
No ratings yet
CUHK STAT5102 Ch7
33 pages
Chapter 3
No ratings yet
Chapter 3
31 pages
Macc Reviewer
No ratings yet
Macc Reviewer
13 pages
India's Growth Story
No ratings yet
India's Growth Story
67 pages
Ahssan 017 Thesis
No ratings yet
Ahssan 017 Thesis
66 pages
Group - 2 Correlation and Regression - Assgn1
No ratings yet
Group - 2 Correlation and Regression - Assgn1
16 pages
COST
No ratings yet
COST
15 pages
Mixed Concrete Optimization Using Fly Ash, Silica Fume and Iron Slag On The SCC's Compressive Strength - 2013 Indonesia
No ratings yet
Mixed Concrete Optimization Using Fly Ash, Silica Fume and Iron Slag On The SCC's Compressive Strength - 2013 Indonesia
13 pages
Machine Learning in Clinical Neuroscience Foundations and Applications ISBN 3030852911, 9783030852917 Complete Ebook Edition
No ratings yet
Machine Learning in Clinical Neuroscience Foundations and Applications ISBN 3030852911, 9783030852917 Complete Ebook Edition
15 pages
Detecting The Moment of Learning
No ratings yet
Detecting The Moment of Learning
23 pages
Analysis of The Factors Affecting The Price of Gol
No ratings yet
Analysis of The Factors Affecting The Price of Gol
6 pages
Trends in Portfolio Management Services of Axis Bank, Icici and HDFC Bank
No ratings yet
Trends in Portfolio Management Services of Axis Bank, Icici and HDFC Bank
18 pages
The Self Regulated Learning, Habit of Mind, and Creativity As High Order Thinking Skills Predictors H. Hodiyanto Muhamad Firdaus
No ratings yet
The Self Regulated Learning, Habit of Mind, and Creativity As High Order Thinking Skills Predictors H. Hodiyanto Muhamad Firdaus
10 pages
Steps of Implementation of A GLM
No ratings yet
Steps of Implementation of A GLM
8 pages
Characteristics Curve and Persistency of
No ratings yet
Characteristics Curve and Persistency of
7 pages
Statistics Paper 2015
No ratings yet
Statistics Paper 2015
3 pages
2-Notes - Linear Regression-1
No ratings yet
2-Notes - Linear Regression-1
4 pages
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
An Introduction To Digital Design
From Everand
An Introduction To Digital Design
Jason King
2/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Neural Network - Optimization DRAFT 3.11

Uploaded by

Neural Network - Optimization DRAFT 3.11

Uploaded by

Seminar on

NEURAL NETWORKS BASED OPTIMIZATIONS

 An activation function in a neural network

 Linear function causes vanishing gradient problems.

 Control system and monitoring

 Marketing and financial applications

 Forecasting- sales, market, research, meteorology

 It is used to model the probability of an event occurring.

Fig2: Sigmoid function graph

0.2 0.4 Tq= 0.8

 Weights between input and hidden layer

The squared error signal

0.1 0.4  0.0010  0.0015

less than the tolerance.

• For a useful mental model, you can think of a hiker(she)

 The following is the equation of a line of a simple linear regression model: Y = mx + c

Error: Difference between actual and predicted

Gradient descent is used to get to the

Slower processing & convergence due to less

More stable as it converges smoothly towards

Less sensitive to initial parameter values due to frequent

It is more common to select a small

 We have 5 observations, all the 5 observations will be updated at once.

 since we have 5 observations, the parameters will be updated 5 times or we

Cancer 3.4 Healthy 2.0

Cancer 2.9 Healthy 1.7

Cancer 2.8 Healthy 1.4

Cancer 2.7 Healthy 1.2

Cancer 2.1 Healthy 0.9

Cancer 1.6 Healthy 0.8

Cancer 3.4 0.973 Healthy 2.0 0.438

Cancer 2.9 0.902 Healthy 1.7 0.254

Cancer 2.8 0.875 Healthy 1.4 0.130

Cancer 2.7 0.842 Healthy 1.2 0.079

Cancer 2.1 0.506 Healthy 0.9 0.036

Cancer 1.6 0.206 Healthy 0.8 0.028

For Calculating Bias we generally use ordinary least square

• 1. Gradient Descent Application: Primarily used for simpler models or

• Advantage: Offers stable convergence, suitable for high-resolution

• Limitation: Slow convergence, making it less ideal for real-time applications.

• Commonly used in real-time applications where rapid responses are needed,

• Ideal for moderate-sized biomedical image datasets, like those used in

• Advantage: Balances stability and speed, providing better convergence

• Used in applications with high-dimensional data and complex patterns, such

• Advantage: Accelerates convergence by accounting for past gradients,

• Limitation: Needs additional memory resources for storing the momentum

• Advantage: Provides better foresight in parameter updates, improving

• Limitation: Complexity in calculations increases hardware demand, which

• Advantage: Adjusts learning rate dynamically for each parameter, improving

• Limitation: Learning rate decreases over time, potentially causing premature

• Advantage: Adaptive to data without requiring a manually set learning rate,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.