0% found this document useful (0 votes)
11 views47 pages

Session 5

Uploaded by

forads684
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views47 pages

Session 5

Uploaded by

forads684
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

ARTIFICIAL NEURON

MODEL
SESSION - 5
CONTENT

• Rosenblatt’s Perceptron Model


• Case study
• Minsky and Papert Model
• Summary
MCCULLOH-PITTS MODEL

• The first computational model of a neuron was proposed by Warren


McCulloch and Walter Pitts in 1943.
• Simplest binary classification can be achieved by the following way
LIMITATIONS

• It cannot process non-Boolean inputs.


• It gives equal weights to each input.
• Threshold ϴ must be chosen manually.
PERCEPTRON MODEL

• Introduced in 1957 by Frank Rosenblatt.


• Core of deep learning concepts.
• A perceptron is a single layer Neural Network.
• A perceptron can simply be seen as a set of inputs, that are weighted and
to which we apply an activation function.
• This produces sort of a weighted sum of inputs, resulting in an output.
• This is typically used for classification problems, but can also be used for
regression problems.
PERCEPTRON MODEL

• We attach to each input a weight ( wi) and notice how we add an input
of value 1 with a weight of −θ. This is called bias.
• The inputs can be seen as neurons and will be called the input layer.
Altogether, these neurons and the function form a perceptron.
• The binary classification function of perceptron network is represented
as
PERCEPTRON MODEL

• Schematic Representation
PERCEPTRON MODEL

• Comes under single layer feedforward networks.


• Supervised learning approach - mostly used in classification tasks, but
also works for regression approaches.
• Two types – single layer perceptron, multi layer perceptron
PERCEPTRON MODEL

• The output of perceptron network is


• Yin = x1w1 + x2w2 + x3w3 + …… xnwn
• Yout = f(yin)
• Y = f(yin) =
• F(yin) is the activation function (step function)
PERCEPTRON MODEL

• Compare yout with the target output.


• Weight Updation:
• If y ≠ t, then
• w (new) = w (old) + α.t.x
• Where
• α- learning rate, t – target, x – input
• Similarly bias updation:
• b (new) = b (old) + α.t
TRAINING ALGORITHM

• Step 0: Initialize weights and bias. α=1


• Step 1: Perform steps 2 to 5 until stop condition is reached.
• Step 2: Perform step 3 and 4 for each training pair.
• Step 3: Calculate the output

Y = f(yin) =
TRAINING ALGORITHM

• Step 4: weight and bias adjustment


• If y ≠ t, then

w (new) = w (old) + α.t.x


b (new) = b (old) + α.t
• Else:

w (new) = w (old)
b (new) = b (old)
• Step 5: Train the network until stop condition is reached.(no change in weight for all
cases)
CASE STUDY
CASE STUDY

• Implementation of two input AND gate for bipolar input using


Rosenblatt’s perceptron model
MINSKY AND
PAPERT MODEL
INTRODUCTION

• Marvin Minsky and Seymour Papert are two influential figures in the
field of artificial intelligence and neural networks.
• Their book, "Perceptrons: An Introduction to Computational Geometry,"
published in 1969, is a seminal work that critically analyzed the
capabilities and limitations of the Perceptron model, a simple type of
artificial neural network.
• Their analysis highlighted significant challenges in the field and spurred
the development of more complex neural network architectures.
KEY CONCEPTS

17
THE PERCEPTRON MODEL

• A single-layer neural network used for binary classification tasks.


• Composed of input units, weights, a bias term, and an activation
function (typically a step function).
• Functions by computing a weighted sum of inputs and passing the
result through the activation function to produce a binary output.
LINEAR SEPARABILITY

• A fundamental concept in understanding the limitations of the


Perceptron.
• Refers to the ability of a model to classify data points using a linear
decision boundary.
• Problems that can be separated by a straight line (or hyperplane in
higher dimensions) are linearly separable.
MINSKY AND PAPERT'S ANALYSIS

20
LIMITATIONS OF THE PERCEPTRON

• Inability to Solve Non-Linearly Separable Problems: Minsky and


Papert demonstrated that the Perceptron cannot solve problems where
data points are not linearly separable. The XOR problem is a classic
example.
• XOR Problem: The XOR (exclusive OR) problem involves classifying pairs
of binary inputs into two classes. The classes cannot be separated by a
single straight line, making it unsolvable by a single-layer Perceptron.

• Limited Computational Power: They argued that the Perceptron's


computational power is limited and insufficient for complex tasks.
MATHEMATICAL PROOFS

• Minsky and Papert provided mathematical proofs showing the


Perceptron's limitations. They rigorously analyzed the types of
problems that could and could not be solved by the Perceptron.
• Their proofs highlighted that any problem requiring a non-linear
decision boundary cannot be solved by a single-layer Perceptron.
IMPLICATIONS FOR AI RESEARCH

• Criticism and Impact: Their work initially led to a decline in interest


and funding for neural network research during the 1970s, a period
often referred to as the "AI Winter."
• Reevaluation and Revival: Despite the initial setback, their
criticisms were crucial for the eventual resurgence of neural networks.
Researchers recognized the need for multi-layer networks, leading to
the development of more advanced models like multi-layer perceptrons
(MLPs) and deep neural networks.
ADVANCES FOLLOWING MINSKY AND
PAPERT'S WORK

24
MULTI-LAYER PERCEPTRONS (MLPS):

• Introduction of additional layers (hidden layers) between the input and


output layers.
• Use of non-linear activation functions (e.g., sigmoid, tanh, ReLU) in
hidden layers.
• Ability to solve non-linearly separable problems and learn complex
patterns.

25
BACKPROPAGATION ALGORITHM:

• Developed in the 1980s, backpropagation is a supervised learning


algorithm used to train multi-layer neural networks.
• Involves adjusting weights through gradient descent to minimize the
error between predicted and actual outputs.
• Enabled the training of deep neural networks and addressed the
limitations highlighted by Minsky and Papert.

26
NEURAL NETWORK ARCHITECTURES:

• Emergence of various neural network architectures, such as


Convolutional Neural Networks (CNNs) for image processing and
Recurrent Neural Networks (RNNs) for sequential data.
• Significant improvements in computational power and availability of
large datasets facilitated advancements in neural networks.

27
ACTIVATION FUNCTIONS
Session - 6

28
ACTIVATION FUNCTIONS

• Activation functions are functions used in a neural network to compute


the weighted sum of inputs and biases, which is in turn used to decide
whether a neuron can be activated or not.
• Activation functions play an integral role in neural networks by
introducing nonlinearity.
• This nonlinearity allows neural networks to develop complex
representations and functions based on the inputs that would not be
possible with a simple linear regression model.

29
TYPES OF ACTIVATION FUNCTIONS

• Linear Activation Functions


• Sigmoid
• ReLU
• Leaky ReLU
• Tanh
• Step Function

30
LINEAR ACTIVATION FUNCTIONS

• The linear activation function, also


known as "no activation," or "identity
function" (multiplied x1.0), is where the
activation is proportional to the input.
• The function doesn't do anything to the
weighted sum of the input, it simply spits
out the value it was given.

• Mathematically, it can be represented as

31
LIMITATIONS OF LINEAR ACTIVATION
FUNCTION
• It’s not possible to use backpropagation as the derivative of the
function is a constant and has no relation to the input x.
• All layers of the neural network will collapse into one if a linear
activation function is used. No matter the number of layers in the
neural network, the last layer will still be a linear function of the first
layer. So, essentially, a linear activation function turns the neural
network into just one layer.

32
BINARY STEP FUNCTION

• Binary step function depends on a


threshold value that decides whether a
neuron should be activated or not.
• The input fed to the activation function
is compared to a certain threshold; if the
input is greater than it, then the neuron
is activated, else it is deactivated,
meaning that its output is not passed on
to the next hidden layer.
• Mathematically it can be represented as:

33
LIMITATIONS OF BINARY STEP FUNCTION

• It cannot provide multi-value outputs—for example, it cannot be used


for multi-class classification problems.
• The gradient of the step function is zero, which causes a hindrance in
the backpropagation process.

34
SIGMOID / LOGISTIC ACTIVATION FUNCTION

• This function takes any real value as


input and outputs values in the range of
0 to 1.

• The larger the input (more positive), the


closer the output value will be to 1.0,
whereas the smaller the input (more
negative), the closer the output will be
to 0.0, as shown in figure.
• Mathematically, it can be represented
as

35
LIMITATIONS OF SIGMOID ACTIVATION
FUNCTION
• The derivative of the function is f'(x) =
sigmoid(x)*(1-sigmoid(x)).
• As we can see from the Figure, the gradient values
are only significant for range -3 to 3, and the graph
gets much flatter in other regions.
• It implies that for values greater than 3 or less
than -3, the function will have very small
gradients. As the gradient value approaches zero,
the network ceases to learn and suffers from
the Vanishing gradient problem.
• The output of the logistic function is not symmetric
around zero. So the output of all the neurons will
be of the same sign. This makes the
training of the neural network more difficult and
unstable.

36
TANH FUNCTION (HYPERBOLIC TANGENT)

• Tanh function is very similar to the


sigmoid/logistic activation function, and
even has the same S-shape with the
difference in output range of -1 to 1. In
Tanh, the larger the input (more
positive), the closer the output value
will be to 1.0, whereas the smaller the
input (more negative), the closer the
output will be to -1.0.
• Mathematically, it can be represented
as

37
ADVANTAGE OF TANH FUNCTION

• The output of the tanh activation function is Zero centered; hence we


can easily map the output values as strongly negative, neutral, or
strongly positive.
• Usually used in hidden layers of a neural network as its values lie
between -1 to 1; therefore, the mean for the hidden layer comes out to
be 0 or very close to it. It helps in centering the data and makes
learning for the next layer much easier.

38
LIMITATIONS OF TANH FUNCTION

• As you can see— it also faces the problem


of vanishing gradients similar to the
sigmoid activation function. Plus the
gradient of the tanh function is much
steeper as compared to the sigmoid
function.
• Note: Although both sigmoid and tanh
face vanishing gradient issue, tanh is zero
centered, and the gradients are not
restricted to move in a certain direction.
Therefore, in practice, tanh nonlinearity is
always preferred to sigmoid nonlinearity.

39
RELU FUNCTION

• ReLU stands for Rectified Linear Unit.


• Although it gives an impression of a linear
function, ReLU has a derivative function and
allows for backpropagation while simultaneously
making it computationally efficient.
• The ReLU function is a simple max(0, x) function,
which can also be thought of as a piecewise
function with all inputs less than 0 mapping to 0
and all inputs greater than or equal to 0
mapping back to themselves (i.e., identity
function). Graphically,
• Mathematically it can be represented as

40
ADVANTAGES OF RELU FUNCTION

• Since only a certain number of neurons are activated, the ReLU


function is far more computationally efficient when compared to the
sigmoid and tanh functions.
• ReLU accelerates the convergence of gradient descent towards the
global minimum of the loss function due to its linear, non-saturating
property.

41
LIMITATIONS OF RELU FUNCTION

• The Dying ReLU problem,


• The negative side of the graph makes the
gradient value zero. Due to this reason,
during the backpropagation process, the
weights and biases for some neurons are
not updated. This can create dead
neurons which never get activated.
• All the negative input values become
zero immediately, which decreases the
model’s ability to fit or train from the
data properly.

42
LEAKY RELU FUNCTION

• Leaky ReLU is an improved


version of ReLU function to solve
the Dying ReLU problem as it
has a small positive slope in the
negative area.
• Mathematically it can be
represented as:

43
ADVANTAGES OF LEAKY RELU FUNCTION

• The advantages of Leaky ReLU are same


as that of ReLU, in addition to the fact
that it does enable backpropagation,
even for negative input values.

• By making this minor modification for


negative input values, the gradient of
the left side of the graph comes out to
be a non-zero value. Therefore, we
would no longer encounter dead
neurons in that region.

44
LIMITATIONS OF LEAKY RELU FUNCTION

• The predictions may not be consistent for negative input values.


• The gradient for negative values is a small value that makes the
learning of model parameters time-consuming.

45
THANKS

46
THANKS
ANN TEAM

47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy