0% found this document useful (0 votes)

13 views57 pages

ANNs

Artificial Neural Networks (ANN) are computational models inspired by biological neural networks, consisting of layers of interconnected neurons that learn from data. They utilize activation functions to introduce non-linearity, enabling the model to handle complex tasks and improve performance through backpropagation. Key components include selecting appropriate weights, hidden layers, and neurons, as well as understanding various activation functions like ReLU and Sigmoid, which impact the learning process and efficiency of the network.

Uploaded by

gagan gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views57 pages

ANNs

Uploaded by

gagan gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Artificial Neural Networks

(ANN)
Artificial Neural Network (ANN)
• ANN is a computational model that is inspired by the way biological neural networks
in the human brain process information.
• It is a method of computing, based on the interaction of multiple connected processing
elements.

• A powerful technique to solve many real-world problems.

• The ability to learn from experience in order to improve their performance.

• Ability to deal with incomplete information

• Comprised of one or more layers of neurons

Biological Neuron vs ANN
Artificial Neural Network (ANN)
• A neural network (NN) model consists of fully connected stacked layers of
artificial neurons – an input layer, one or more hidden layers, and an output layer,
to learn data representation.
• Each layer of DNN consists of computational nodes called neurons.
• Each neuron is connected to other neuron through edges just like synapses
connect real neurons.
• Each edge has certain weight indicating whether the strength of the
connection.

• The depth of the model generally specifies the number of hidden layers.
• The learning of a model is indeed the process of finding the appropriate weights
through backpropagation.
Artificial Neural Network (ANN)
• DL, an umbrella term, representing recent advances in neural networks (NN).
• DL mimics the human brain and processes the information at multiple levels of
abstractions from high-dimensional raw data.
• DNNs mimic the efficiency and robustness of human brain to process massive
data.
A Single Neuron
• The basic unit of computation in a
neural network is the neuron, often
called a node or unit.
• It receives input from some other
nodes, or from an external source
and computes an output.
• Each input has an
associated weight (w), which is
assigned on the basis of its relative
importance to other inputs.
• The node applies a function f to the
weighted sum of its inputs.
A Single Neuron
• This network takes numerical inputs X1 and X2 and has
weights w1 and w2 associated with those inputs. Additionally, there is another
input 1 with weight b (called the Bias) associated with it.
• The output Y from the neuron is computed using the function f called
the Activation Function. The purpose of the activation function (non-linear) is to
introduce non-linearity into the output of a neuron.
• This is important because most real-world data is non-linear and we want
neurons to learn these non-linear representations.
• Every activation function (or non-linearity) takes a single number and performs a
certain fixed mathematical operation on it. There are several activation functions.
Selecting Weights
• The initialization of the parameters considerably affects the convergence of
backpropagation during the training of DNN.
• Too small or large values of weights lead to slow learning and divergence respectively.
• The rule of thumb is that the mean of activation* should be zero and the variance**
should remain same across the layers.
• Applicable to both forward (i.e. activation) as well as backward propagation (i.e. for
gradient of cost w.r.t activations)

* If activations are centred around zero, gradient updates are more balanced, preventing bias in weight
updates.
** If variance shrinks, the signal weakens, leading to vanishing gradients. If variance explodes, activations
become large, leading to unstable training.
Selecting Hidden Layers
• Difficult to find appropriate number of hidden layers and neurons per hidden
layer.
• Start with one or two hidden layers and thereafter, gradually increase the layers
depending upon the complexity of the problem, until the model starts
experiencing overfitting.
• A model with few layers but large number of neurons takes more time to learn as
compared to a model having more layers but lesser number of neurons.
• In general, it is always advisable to increase number of layers rather than number
of neurons per layers.
Selecting Neurons
• The number of neurons in the input and output layers depend upon the size of
input feature vector and the classification problem.
• For hidden layers, it is a common practice to choose the number of neurons in
such a way that it forms a funnel, i.e., placing lesser and lesser numbers of
neurons in succeeding layers.
• A model with few layers but a large number of neurons takes more time to learn
as compared to a model having more layers but a lesser number of neurons.
• A good practice to start with fewer neurons and gradually increase their count
until the architecture experiences overfitting.
Perceptron
• One of simplest ANN architecture, primary used for binary classification,
introduced by Frank Rosenblatt (1957).

• A simple neuron that is used to classify its input into one of the two categories.

• A simple processing unit of a neural network. It uses a step function (creates a

sharp jump at x = 0).

• LR makes use of sigmoid function whereas perceptron makes use step function.
Perceptron
Perceptron
• At x=0, there is an abrupt jump (discontinuity).

• The function is not differentiable at x=0 as left and right derivatives are not equal.

• Since the step function is non-differentiable, it cannot be used in gradient-based

learning methods like backpropagation in deep learning.

• Instead, functions like sigmoid, ReLU, and tanh are used in modern neural
networks.
Activation Functions
• Activation function decides, whether a neuron should be activated or not by
calculating weighted sum and further adding bias with it.

• The purpose of the activation function is to introduce non-linearity into the

output of a neuron.

• A neural network without an activation function is essentially just a linear

regression model.

• The activation function does the non-linear transformation to the input making it
capable to learn and perform more complex tasks. It is also known as Transfer
Function.
Activation Functions
• Let’s suppose we have a neural network working without the activation functions, the
network behaves like a linear model, significantly limits its capabilities.
• Useless deep layers: Every neuron will only be performing a linear transformation on the
inputs using the weights and biases. It’s because it doesn’t matter how many hidden
layers we attach in the neural network; all layers will behave in the same way because
the composition of two linear functions is a linear function itself.
• Although the neural network becomes simpler, learning any complex task is impossible,
and our model would be just a linear regression model.
• The Activation Functions can be divided into 2 types:
• Linear Activation Function
• Non-linear Activation Functions
Linear Activation Function
Range : - ∞ to +∞

No matter how many layers we have, if all are linear in

nature, the final activation function of last layer is
nothing but just a linear function of the input of first
layer. The entire network behaves like a single-layer
perceptron, no matter how many layers we add.

Uses : Linear activation function is used at just one

place i.e. output layer, commonly used in regression
based NN.
Non-Linear Activation Functions | Sigmoid
• Takes real-valued input and squashes it to range between 0 and 1.

σ(x) = 1 / (1 + exp(−x))

d σ(x) = σ(x) *(1- σ(x))

Derivative of Sigmoid Function | Vanishing Gradient
Sigmoid Function | Vanishing Gradient Effects in DNN

• Slow Learning: Weight updates become insignificant, making training very slow.

• Early Layers Stop Learning: The initial layers of the network stop updating
effectively, leading to poor feature learning.

• Poor Performance: Networks with many layers struggle to train efficiently.

Tanh (Hyperbolic Tangent) | NLAF
• Takes a real-valued input and squashes
it to the range [-1, 1].

• S-shaped (sigmoid-like), but centered at 0.

• Benefits
• Zero-centered output: Unlike Sigmoid, which
outputs in the range (0,1), Tanh outputs in
(−1,1), making it easier for weights to update
symmetrically.

• Stronger gradients: Tanh has a steeper slope

than Sigmoid, so it can train faster.
Derivative of Tanh
• The problem of vanishing gradients persists even
in Tanh function.

• For large values of x (positive or negative), Tanh

saturates, implies gradients become very small.
This causes vanishing gradients, making deep
networks hard to train.

• Still better than Sigmoid, but ReLU is preferred for

deep learning.
Non-Linear Activation Functions | ReLU
ReLU stands for Rectified Linear Unit, one of the most widely used AF, with range [0, ∞).
Non-Linear Activation Functions | ReLU
Benefits
• Solves vanishing gradient problem: Unlike Sigmoid & Tanh, ReLU does not saturate for positive
values.
• Computationally fast: Only requires a simple max operation and no expensive exponentials.
Less expensive operations as compared to sigmoid or tanh.
• Sparse activation: If x ≤0, the output is 0, making the network efficient.

Limitation

When inputs approach zero or are negative, the gradient of the function becomes zero, the

network cannot perform backpropagation and cannot learn. This is known as Dying ReLU problem.

If too many neurons output 0, they stop learning because the gradient is 0 for x ≤ 0.
Non-Linear Activation Functions | ReLU
An activation function saturates when its output reaches a limit, causing its derivative to
become very small (close to zero).
• Sigmoid saturates at 0 and 1
• Tanh saturates at -1 and 1

Why ReLU Does Not Saturate at 1?

• ReLU does not have an upper bound like Sigmoid or Tanh.
• As x → ∞, ReLU keeps increasing linearly instead of flattening.
• The derivative of ReLU for x > 0 is always 1, meaning it does not saturate.
Non Linear Activation Functions | Leaky ReLu
• The Leaky ReLU is same as ReLU for positive inputs but for negative inputs it has a
constant slope (less than 1).

f(x) = max (0.01 x, x)

f’(x) = 0.01 if x < 0 and 1 if x >= 0

• Prevents the Dying ReLU problem, i.e.,

• Prevents neurons from becoming completely inactive
• This variation of ReLU has a small positive slope in the negative area, so it does enable
backpropagation, even for negative input values.
Bias
• The main function of Bias is to provide every node with a trainable constant value
(in addition to the normal inputs that the node receives).

• It is like the intercept added in a linear equation. It is an additional parameter in

the NN which is used to adjust the output along with the weighted sum of the
inputs to the neuron. Therefore, bias is a constant which helps the model in a way
that it can fit best for the given data.

Output = Sum (Weights * Inputs) + Bias

Feedforward Neural Network
• It is the simplest type of ANN. It contains multiple neurons (nodes)
arranged in layers. Nodes from adjacent layers
have connections or edges between them. All these connections
have weights associated with them.

• In a feedforward network, the

information moves in only forward
direction from the input nodes,
through the hidden nodes (if any) and
to the output nodes.
Feedforward Neural Network
• Input Nodes
The Input nodes provide information from the outside world to the network and are
together referred to as the “Input Layer”. No computation is performed in any of the
input nodes - they just pass on the information to the hidden nodes.
• Hidden Nodes
The Hidden nodes have no direct connection with the outside world (hence the
name “hidden”). They perform computations and transfer information from the input
nodes to the output nodes. A collection of hidden nodes forms a “Hidden Layer”. While
a feedforward network will only have a single input layer and a single output layer, it can
have zero or multiple hidden layers.
• Output Nodes
The Output nodes are collectively referred to as the “Output Layer” and are responsible for
computations and transferring information from the network to the outside world.
Examples of Feedforward NN
• Single Layer Perceptron – This is the simplest feedforward neural network and
does not contain any hidden layer.
• Multi Layer Perceptron – A Multi Layer Perceptron (MLP) contains one or more
hidden layers (apart from one input and one output layer).
MLP with a Single Hidden Layer
• Note that all connections have weights
associated with them, but only three weights
(w0, w1, w2) are shown in the figure and only
a single calculation is shown.

• Given a set of features X = (x1, x2, …) and a

target Y, a MLP can learn the
relationship between the features and the
target, for either classification or regression.
Training MLP: The Backpropagation Algorithm
• The process by which a MLP learns is called the Backpropagation algorithm. It
allows the network to learn by minimizing errors and adjusting weights
efficiently.

• Backward Propagation of Errors is one of the several ways in which an ANN can
be trained. It is a supervised training scheme.

• An ANN consists of nodes in different layers; input layer, hidden layer(s) and the
output layer. The connections between nodes of adjacent layers have “weights”
associated with them.

• The goal of learning is to assign correct weights for these edges. Given an input
vector, these weights determine what the output vector is.
Backpropagation Algorithm
• Initially, all the edge weights are randomly assigned.

• Forward Propagation: Compute the output using the current weights.

• Error Calculation: Compare the predicted output with the actual output.

• Backward Propagation: Adjust weights using gradient descent to minimize error.

• Repeat Until Convergence: Continue iterating until the error is sufficiently small.
MLP Example
• Suppose we have the following student-marks dataset:

• Now, suppose, we want to predict whether a student studying 25 hours and having 70
marks in the mid-term will pass the final term.

• This is a binary classification problem where MLP can learn from the given examples
(training data) and make an informed prediction given a new data point.
Neural Network Architecture
• Input Layer: Two neurons (Hours Studied, Mid Term Marks)

• Hidden Layer: Two neurons

• Output Layer: Two neurons (Probabilities for Pass and Fail)

• Activation Functions:
• Hidden Layer: ReLU (Rectified Linear Unit)

• Output Layer: Softmax

Step 1: Forward Propagation
• All weights in the network are randomly assigned.

• Lets consider the hidden layer node marked V. Assume the weights of the
connections from the inputs to that node are w1, w2 and w3.

• The network then takes the first training example as input i.e. [35, 67]

• Desired output from the network (target) = [1, 0]

Step 1: Forward Propagation
Initially, the weights are randomly initialized. Then output V from the node in
consideration can be calculated as below (f is an activation function):

V = f (1w1 + 35w2 + 67*w3)

where:
w1 is the bias weight
w2 is the weight for Hours Studied
w3 is the weight for Mid Term Marks
f is the activation function (ReLU in the hidden layer)

Similarly, outputs from the other node/neuron in the hidden layer is also calculated. The outputs
of the two nodes in the hidden layer act as inputs to the two nodes in the output layer. This
enables us to calculate output probabilities from the two nodes in output layer.
Step 1: Forward Propagation (Computation of Outputs)
• Compute outputs from hidden layer neurons
• Each hidden neuron receives inputs, applies a weighted sum, and passes it through an
activation function.

• Compute outputs from output layer neurons

• The hidden layer’s outputs act as inputs to the output layer.
• Apply Softmax Activation to get probabilities of Pass and Fail.

For the given input [35, 67], suppose the predicted probabilities are: [0.4, 0.6]
This means the network predicts: Pass probability = 0.4 & Fail probability = 0.6
Step 1: Forward Propagation (Computation of Outputs)
However, the actual label is [1, 0] (meaning the student should Pass). Thus, we have an
incorrect prediction with a high error.
Step 2: Backpropagation and Weight Updation
• We calculate the total error at the output nodes and propagate these errors
back through the network using Backpropagation to calculate the gradients.

• Then we use an optimization method such as Gradient Descent to

‘adjust’ all weights in the network to reduce the error at the output layer.

• Suppose that the new weights associated with the node in consideration are
w4, w5 and w6 (after backpropagation and adjusting weights).
Step 2: Backpropagation and Weight Updation
If we now input the same example to the network again, the network should perform
better than before since the weights have now been adjusted to minimize the error in
prediction.

As shown in figure, the errors at the output nodes now reduce to [0.2, -0.2] as compared
to [0.6, -0.4] earlier. This means that our network has learnt to correctly classify our first
training example.
•
Step 2: Backpropagation and Weight Updation
• We repeat this process with all other training examples in our dataset. Then, our
network is said to have learnt those examples.

• If we now want to predict whether a student studying 25 hours and having 70

marks in the mid term will pass the final term, we go through the forward
propagation step and find the output probabilities for Pass and Fail.
Summary
1. Forward Pass: Compute outputs from input to output layer.

2. Loss Calculation: Compare predicted and actual output.

3. Backward Pass (Backpropagation):

a) Compute gradients.

b) Adjust weights using Gradient Descent.

4. Repeat for all training examples.

5. Final Prediction on test data.

Gradient Descent
• Gradient Descent is defined as one of the most
commonly used iterative optimization algorithms
of machine learning to train the machine
learning and deep learning models. It helps in
finding the local minimum of a function.

• So, the idea is to pass the training set through

the hidden layers of the neural network and
then update the parameters of the layers by
computing the gradients using the training
samples from the training dataset.

43
Gradient Descent algorithm
• Step 1: Initialize the weights (a & b) with random values and calculate Error (SSE)

• Step 2: Calculate the gradient, i.e., change in SSE when the weights (a & b) are changed by a very
small value from their original randomly initialized value. This helps us move the values of a & b in
the direction in which SSE is minimized.

• Step 3: Adjust the weights with the gradients to reach the optimal values where SSE is minimized.
a = a – r * ∂(SSE)/∂a, b = b – r * ∂(SSE)/∂b

• Step 4: Use the new weights for prediction and to calculate the new SSE

• Step 5: Repeat steps 2 and 3 till further adjustments to weights doesn’t significantly reduce the
Error.
44
Learning Rate
The steps which are taken to reach optimal point decides the rate of gradient descent. It is often
referred to as 'Learning rate‘ (i.e., the size of the steps).

Too big
bounce between the convex function and may not reach the local minimum.

Too small
gradient descent will eventually reach the local minimum but it will take too much time for that

Just right
gradient descent will eventually reach the local minimum but it will take too much time for that

45
Learning Rate

46
Batch Gradient Descent
In Batch Gradient Descent, all the training data is taken into consideration to take a single
step. We take the average of the gradients of all the training examples and then use that mean
gradient to update our parameters. So that’s just one step of gradient descent in one epoch.

How it Works:
• Computes the gradient of the loss function using entire dataset.
• Updates the parameters (weights and biases) in the direction that minimizes the loss.
• Repeats until convergence.
Batch Gradient Descent
But what if our dataset is very large. Suppose our dataset has 5 million examples,
then just to take one step the model will have to calculate the gradients of all the 5
million examples. This does not seem an efficient way. To tackle this problem we
have Stochastic Gradient Descent.
Batch Gradient Descent

• The graph of cost vs epochs is

also quite smooth because we are
averaging over all the gradients of
training data for a single step. The
cost keeps on decreasing over the
epochs.
Epocs
An epoch in machine learning refers to one complete pass of the entire training dataset through
the model.

Example:
If you have 1000 samples and a batch size of 100, then:
• 1 epoch = 10 iterations (since 1000/100 = 10 batches).
• If training runs for 10 epochs, the model sees each data point 10 times.

Why Multiple Epochs?

• One epoch might not be enough to learn all patterns.
• Too many epochs can cause overfitting.
Stochastic Gradient Descent
Stochastic Gradient Descent is an optimization algorithm used to minimize the loss
function by updating model parameters one random sample at a time.

How It Works:
• Instead of computing the gradient using the entire dataset (like Batch Gradient
Descent), SGD picks one random sample per iteration.
• The model updates its parameters immediately after computing the gradient for that
single sample.
• This process continues until all samples have been used, completing one epoch.
Stochastic Gradient Descent
• The cost is decreasing with fluctuations.
• Because the cost is so fluctuating, it will
never reach the minima but it will keep
dancing around it.
• SGD can be used for larger datasets. It
converges faster when the dataset is large
as it causes updates to the parameters
more frequently.
• Since in SGD we use only one example at a
time, we cannot implement the vectorized
implementation on it. This can slow down
the computations.
Mini Batch Gradient Descent
• Mini-Batch Gradient Descent is a compromise between Batch Gradient Descent (BGD)
and Stochastic Gradient Descent (SGD). Instead of updating weights after every sample
(SGD) or after the entire dataset (BGD), MBGD updates weights after processing a small
batch of samples.

• Steps involved in one epoch:

1. Pick a mini-batch
2. Feed it to Neural Network
3. Calculate the mean gradient of the mini-batch.
4. Use the mean gradient we calculated in Step 3 to update the weights.
5. Repeat steps 1–4 for the mini-batches.
Mini Batch Gradient Descent
• Just like SGD, the average cost over the epochs in mini-batch gradient
descent fluctuates because we are averaging a small number of examples at
a time.

• So, when we are using the mini-batch gradient descent we are updating our
parameters frequently as well as we can use vectorized implementation for
faster computations.
Differences Between Batch, Stochastic, and Mini-Batch
Gradient Descent
Key Takeaways: Batch, Stochastic, and Mini-Batch
Gradient Descent
• Batch GD: Stable but slow, inefficient for large datasets.

• Stochastic GD: Faster, but noisy and less stable.

• Mini-Batch GD: Best balance between speed and stability, commonly used in
deep learning.
How a Neural Network is Trained

Artificial Neural Networks (ANN)
No ratings yet
Artificial Neural Networks (ANN)
67 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
15 pages
Activation Functions
No ratings yet
Activation Functions
11 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Module-4 Neural Network
No ratings yet
Module-4 Neural Network
61 pages
Unit 2 - Activation Function - PR
No ratings yet
Unit 2 - Activation Function - PR
22 pages
Activation FN
No ratings yet
Activation FN
15 pages
Neural Networks
No ratings yet
Neural Networks
61 pages
Lecture - 05 (Introduction To ANN)
No ratings yet
Lecture - 05 (Introduction To ANN)
27 pages
Activation Function 1706811454
No ratings yet
Activation Function 1706811454
11 pages
Unit 2 - Machine Learning
No ratings yet
Unit 2 - Machine Learning
19 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Activation Function
No ratings yet
Activation Function
4 pages
Artificial Neural Artificial Neural Networks
No ratings yet
Artificial Neural Artificial Neural Networks
40 pages
Research Proposal Presentation
No ratings yet
Research Proposal Presentation
20 pages
Aditya Jain NN Assignment
No ratings yet
Aditya Jain NN Assignment
13 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
29 pages
Neural Network
No ratings yet
Neural Network
85 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
86 pages
Unit 5 Activation Function
No ratings yet
Unit 5 Activation Function
15 pages
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
No ratings yet
ML MU Unit 5NeuralNetworkpdf 2025 04 16 13 47 39
57 pages
Lec08-1Activation Functions
No ratings yet
Lec08-1Activation Functions
19 pages
Unit V Neural Networks
No ratings yet
Unit V Neural Networks
35 pages
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
No ratings yet
CS 522 Selected Topics in CS: Lecture 07 - Artificial Neural Network
52 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Activation Function in NN
No ratings yet
Activation Function in NN
29 pages
Unit 4
No ratings yet
Unit 4
19 pages
Activatn FN 2
No ratings yet
Activatn FN 2
10 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
Unit 5
No ratings yet
Unit 5
59 pages
Machine Learning (CSO851) - Lecture 08
No ratings yet
Machine Learning (CSO851) - Lecture 08
27 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
DL Answers
No ratings yet
DL Answers
24 pages
4 - Activation Functions in Neural Networks
No ratings yet
4 - Activation Functions in Neural Networks
12 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Module1
No ratings yet
Module1
124 pages
Fundamentals of ML - Pre Quiz - Attempt Review
No ratings yet
Fundamentals of ML - Pre Quiz - Attempt Review
4 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
NNDL Umit 1 Important Questions
No ratings yet
NNDL Umit 1 Important Questions
8 pages
Lecture8,9-Neural Networks
No ratings yet
Lecture8,9-Neural Networks
65 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
Perceptron in Machine Learning
No ratings yet
Perceptron in Machine Learning
11 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Unit 2
No ratings yet
Unit 2
18 pages
Deep Learning: International Islamic University of Chittagong
No ratings yet
Deep Learning: International Islamic University of Chittagong
31 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Deep Learning If4071
No ratings yet
Deep Learning If4071
2 pages
2.neural Network
No ratings yet
2.neural Network
19 pages
Deep Learning For Music
No ratings yet
Deep Learning For Music
310 pages
21el3203-Advanced Machine Learning-Lab Workbook Final
No ratings yet
21el3203-Advanced Machine Learning-Lab Workbook Final
150 pages
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
No ratings yet
Ahila Priyadharshini Et Al - 2019 - Maize Leaf Disease Classification Using Deep Convolutional Neural Networks
9 pages
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
No ratings yet
An Introduction To Supervised Machine Learning and Pattern Classification - The Big Picture
55 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
11 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
8 pages
Al3451 ML
No ratings yet
Al3451 ML
6 pages
Artificial Intelligence: Long Short Term Memory Networks
No ratings yet
Artificial Intelligence: Long Short Term Memory Networks
14 pages
Unit 3
No ratings yet
Unit 3
86 pages
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
No ratings yet
CC282 Unsupervised Learning (Clustering) : Lecture 7 Slides For CC282 Machine Learning, R. Palaniappan, 2008 1
38 pages
Chapter 2 - Artificial Neural Networks (ANNs)
No ratings yet
Chapter 2 - Artificial Neural Networks (ANNs)
27 pages
Is Zc415 (Data Mining BITS-WILP)
No ratings yet
Is Zc415 (Data Mining BITS-WILP)
4 pages
A Deep CNN Approach For Plant Disease
No ratings yet
A Deep CNN Approach For Plant Disease
6 pages
DL Unit 1
No ratings yet
DL Unit 1
19 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Neural Network
No ratings yet
Neural Network
9 pages
Ashageri Assignment
No ratings yet
Ashageri Assignment
13 pages
Pert19 - Learning From Examples II
No ratings yet
Pert19 - Learning From Examples II
29 pages
Unit4 Mcqs
No ratings yet
Unit4 Mcqs
7 pages
ANN Assignment I
No ratings yet
ANN Assignment I
44 pages
Paper of Rolling Net
No ratings yet
Paper of Rolling Net
9 pages
Navya - Week 3 Assignment
No ratings yet
Navya - Week 3 Assignment
14 pages
ACT6100 A2020 Sup 12
No ratings yet
ACT6100 A2020 Sup 12
37 pages
Back Propagation Neural Network
No ratings yet
Back Propagation Neural Network
5 pages
A Deep Convolutional Neural Network Learning Transfer To SVM-Based Segmentation Method For Brain Tumor
No ratings yet
A Deep Convolutional Neural Network Learning Transfer To SVM-Based Segmentation Method For Brain Tumor
5 pages
Note On Backpropagation John Hull: Ith Observation, and y
No ratings yet
Note On Backpropagation John Hull: Ith Observation, and y
2 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ANNs

Uploaded by

ANNs

Uploaded by

Artificial Neural Networks

• A powerful technique to solve many real-world problems.

• The ability to learn from experience in order to improve their performance.

• Ability to deal with incomplete information

• Comprised of one or more layers of neurons

• A simple processing unit of a neural network. It uses a step function (creates a

• Since the step function is non-differentiable, it cannot be used in gradient-based

• The purpose of the activation function is to introduce non-linearity into the

• A neural network without an activation function is essentially just a linear

No matter how many layers we have, if all are linear in

Uses : Linear activation function is used at just one

d σ(x) = σ(x) *(1- σ(x))

• Poor Performance: Networks with many layers struggle to train efficiently.

• S-shaped (sigmoid-like), but centered at 0.

• Stronger gradients: Tanh has a steeper slope

• For large values of x (positive or negative), Tanh

• Still better than Sigmoid, but ReLU is preferred for

Why ReLU Does Not Saturate at 1?

f(x) = max (0.01 x, x)

• Prevents the Dying ReLU problem, i.e.,

• It is like the intercept added in a linear equation. It is an additional parameter in

Output = Sum (Weights * Inputs) + Bias

• In a feedforward network, the

• Given a set of features X = (x1, x2, …) and a

• Forward Propagation: Compute the output using the current weights.

• Backward Propagation: Adjust weights using gradient descent to minimize error.

• Hidden Layer: Two neurons

• Output Layer: Two neurons (Probabilities for Pass and Fail)

• Output Layer: Softmax

• Desired output from the network (target) = [1, 0]

V = f (1*w1 + 35*w2 + 67*w3)

• Compute outputs from output layer neurons

• Then we use an optimization method such as Gradient Descent to

• If we now want to predict whether a student studying 25 hours and having 70

2. Loss Calculation: Compare predicted and actual output.

3. Backward Pass (Backpropagation):

b) Adjust weights using Gradient Descent.

4. Repeat for all training examples.

5. Final Prediction on test data.

• So, the idea is to pass the training set through

• The graph of cost vs epochs is

Why Multiple Epochs?

• Steps involved in one epoch:

• Stochastic GD: Faster, but noisy and less stable.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

V = f (1w1 + 35w2 + 67*w3)