AI Lec-Module-III
AI Lec-Module-III
Module II
• Uncertainty in AI, Uncertainty, Probability, Syntax and Semantics, Inference,
Independence and Bayes' Rule, Bayesian Network, Neural Networks, Support
Vector Machine.
Module III
• Classification & Regression, Supervised, Unsupervised and Reinforcement
Learning, Theory, concepts and applications.
Module IV
• Applications of AI.
Module-III
Learning & Adaptation
Knowledge Representation
➢ Training Data
➢ Basic Rules
➢ Building prior information into NN design
▪ Restricting network architecture
▪ Constraining the choice of synaptic weights
Learning Process
➢ NN is stimulated by an environment.
➢ NN undergoes changes in its free parameters due to above.
➢ The NN responds in a new way to the environment.
Learning Paradigms
➢ Supervised (Learning with teacher)
➢ Unsupervised (Learning without teacher)
➢ Batch Learning
➢ Instantaneous Learning
➢ Offline Learning
➢ Online Learning (Learning-on-the-fly)
Learning Paradigms
Supervised (Learning with teacher)
Supervised Learning is the process of making an algorithm
to learn to map an input to a particular output. This is
achieved using the labelled datasets that you have
collected. If the mapping is correct, the algorithm has
successfully learned.
The teacher acts as a
supervisor, or, an authoritative
source of information that the
student can rely on to guide their
learning. You can also think of the
student’s mind as a computational
engine.
Learning Paradigms
Supervised Learning: Algorithms
Poplar supervised learning algorithms includes:
• Error-correction learning rule
• Linear regression
• Support Vector Machine (SVM)
• Logistic regression
• Random forest
Learning Paradigms
Supervised (Learning with teacher)
Supervised Learning has a lot of challenges and disadvantages
that you could face while working with these algorithms. Some
of these includes:
• You could overfit your algorithm easily
• Good examples need to be used to train the data
• Computation time is very large for Supervised Learning
• Unwanted data could reduce the accuracy
• Pre-Processing of data is always a challenge
• If the dataset is incorrect, you make your algorithm learn
incorrectly which can bring losses
Learning Paradigms
Unsupervised Learning
Unsupervised Learning can be thought of as self-learning where
the algorithm can find previously unknown patterns in datasets
that do not have any sort of labels. You do not interfere when
the algorithm learns.
It helps in modelling probability density functions, finding
anomalies in the data, and much more. For example, think of a
student who has textbooks and all the required material to study
but has no teacher to guide. Ultimately, the student will have to
learn by himself or herself to pass the exams.
Learning Paradigms
Unsupervised Learning
Unsupervised Learning can be classified under the following
two types:
• Clustering
• Association
Learning Paradigms
Unsupervised Learning
Although Unsupervised Learning is used in many well-known
applications and works brilliantly, there are still many
disadvantages, some of these are:
• There is no way of obtaining the way or method the data is
sorted as the dataset is unlabeled.
• They may be less accurate as the input data is not known and
labelled by the humans making the machine do it.
• The information obtained by the algorithm may not always
correspond to the output class that we required.
• The user has to understand and map the output obtained with
the corresponding labels.
Learning Paradigms
Reinforcement Learning
In reinforcement learning, the learning of an input-output
mapping is performed through continued interaction with the
environment in order to minimize a scalar index of
performance.
Reinforcement learning is
closely related to dynamic
programming.
It solves a particular kind of
problem where decision
making is sequential, and the
goal is long-term.
Learning Paradigms
Supervised vs Unsupervised Learning
In the most direct route, the error values can be used to directly
adjust the tap weights, using an algorithm, such as, the
backpropagation algorithm.
Learning Rules
Error Correction Learning: Illustration
Let’s consider a FFNN with only a single neuron k as the
computational node in the output layer. Neuron k is driven by a
signal vector x(n) produced by one or more layers of hidden
neurons.
where, d(xi, xtest) is the Euclidean distance between the vectors xi and
xtest.
Learning Rules
Hebbian Learning
Hebb’s postulate of learning is the oldest and the most famous of all
learning rules; it is named in honor of the neuropsychologist Hebb.
➢ Hebb’s hypothesis
∆𝑤𝑘𝑗 𝑛 = 𝛾(𝑦𝑘 𝑛 𝑥𝑗 𝑛 )
➢ Covariance hypothesis
where, ƞ is the learning rate parameter. This rule has the overall
effect of moving the synaptic weight vector ωk of winning neuron k
towards the input pattern x.
Learning Rules
Boltzmann Learning
The Boltzmann learning rule, named in honor of Ludwig Boltzmann,
is a stochastic learning algorithm derived from idea rooted in
statistical mechanics. A neural network designed on the basis of the
Boltzmann learning rule is called a Boltzmann machine.
In a Boltzmann machine the neurons constitute a recurrent structure,
and they operate in a binary manner since, for example, they are
either in an ‘on’ state denoted by + 1 or in an ‘off’ state denoted by -
1. The machine is characterised by an energy function; E the value
of which is determined by the particular states occupied by the
individual neurons of the machine.
Learning Rules
Boltzmann Learning
Energy function is given as,
Learning Rules
Boltzmann Learning
• Step 1: Initialization
– Set w(0) = 0, then do the following for n = 1, 2, 3, …
• Step 2: Activation
– Activate the perceptron by applying input vector x(n)
and desired output d(n)
Perceptron Convergence Algorithm
Where
d(n) = +1 if x(n) belongs to C1
d(n) = -1 if x(n) belongs to C2
• Step 5
– Increment n by 1, and go back to step 2
Optimization Techniques
➢ Method of Steepest Descent
➢ Newton’s Method
➢ Gauss-Newton Method
➢ Least Mean Square Algorithm
Method of Steepest Descent
➢ The necessary condition for optimality is that the gradient
of the cost function should be zero, i.e.:
𝑁
1
𝛻£ 𝒘 =0 £ 𝒘 = 𝑑𝑖 − 𝐹 𝐱 𝑖 , 𝒘
2
2
𝑖=1
i wij j wjk
xi k yk
m
n l yl
xn
Input Hidden Output
layer layer layer
Error signals
E ( n ) 1
e j
=
e j
j (n)
2 jC
e 2
= ej
e j
y j
=
y j
dj − yj = −1
y j
= (v j ( n ) ) = ' (v j ( n ) )
v j v j
v j
w ji
=
w ji i
w ji ( n ) y i ( n )
= yi ( n )
Back-Propagation Algorithm
Derivation:
The equation regarding the weight corrections can be written as:
w ji (n) = j ( n) yi (n)
Disadvantages:
➢ Slow convergence, large training time.
➢ Chances of being trapped in Local minima.
➢ Works with only differentiable activation functions.
Support Vector Machine
Overview
Support Vector Machine
Overview
Support Vector Machine (SVM) is a powerful supervised
machine learning algorithm used for linear or nonlinear
classification, regression, and even outlier detection tasks.
The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can
separate the data points in different classes in the feature
space.
The hyperplane tries that the margin between the closest
points of different classes should be as maximum as
possible.
Support Vector Machine
Terminology
Hyperplane: Hyperplane is the decision boundary that is
used to separate the data points of different classes in a
feature space. In the case of linear classifications, it will be a
linear equation i.e. wx+b = 0.
Support Vector Machine
Terminology
• Manhattan Distance
Classification & Regression
Regression
Regression is a process of finding the correlations between
dependent and independent variables. It helps in predicting
the continuous variables such as prediction of Market Trends,
prediction of House prices, etc.
Classification & Regression
Regression: Relationship type
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Classification & Regression
Regression: Relationship type
No relationship
X
Classification & Regression
Regression
Various types of regression algorithms are:
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression
Classification & Regression
Linear Regression
Linear regression is a type of supervised machine learning
algorithm that computes the linear relationship between a
dependent variable and one or more independent features.
When the number of the independent feature, is 1 then it is
known as Univariate Linear regression, and in the case of
more than one feature, it is known as multivariate linear
regression.
A simple linear regression involves only one independent
variable and one dependent variable. The equation for simple
linear regression is:
𝑌 = 𝜃1 + 𝜃2 X
Classification & Regression
Linear Regression
Classification & Regression
Linear Regression
The goal of the algorithm is to find the best Fit Line equation
that can predict the values based on the independent variables.
The best-fit line equation provides a straight line that
represents the relationship between the dependent and
independent variables. The best-fit line implies that the error
between the predicted and actual values should be kept to a
minimum.
We utilize the cost function to compute the best values in
order to get the best fit line since different values for weights
or the coefficient of lines result in different regression lines.
Classification & Regression
Linear Regression
The regression line is a hypothetical function, as actual data
distribution is not strictly linear. Hence, there exists some
difference between actual output Y and the predicted output 𝑌.
Accordingly, the cost function is defined as a Mean Squared
Error (MSE), which calculates the average of the squared
errors between the predicted values 𝑌 and the actual value Y.
The purpose is to determine the optimal values for the
intercept 𝜃1 and the coefficient of the input feature 𝜃2
providing the best-fit line for the given data points.
Classification & Regression
Linear Regression: Gradient Descent optimization
The MSE cost function can be calculated as:
𝑛
1 2
𝐶𝑜𝑠𝑡 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝐽 = 𝑦ො𝑖 − 𝑦𝑖
𝑛
𝑖
The different values for weights or the coefficient of lines (𝜃1 ,
𝜃2 ) gives a different line of regression, so we need to calculate
the best values for 𝜃1 and 𝜃2 to find the best fit line, so to
calculate this we use the above cost function.
Utilizing the MSE function, the iterative process of gradient
descent is applied to update the values of 𝜃1 and 𝜃2 .
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Linear Regression: Gradient Descent optimization
The optimization algorithm gradient descent train the linear
regression model by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the
model on a training dataset.
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Linear Regression: Gradient Descent optimization
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Linear Regression: Gradient Descent optimization
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Linear Regression: Gradient Descent optimization
The optimization algorithm gradient descent train the linear
regression model by iteratively modifying the model’s
parameters to reduce the mean squared error (MSE) of the
model on a training dataset. The update rule is given as:
𝑛
2
𝜃1 = 𝜃1 − 𝜂(𝐽𝜃′ 1 ) 𝜃1 = 𝜃1 − 𝜂 𝑦ො𝑖 − 𝑦𝑖
𝑛
𝑖=1
𝑛
2
𝜃2 = 𝜃2 − 𝜂(𝐽𝜃′ 2 ) 𝜃2 = 𝜃2 − 𝜂 𝑦ො𝑖 − 𝑦𝑖 . 𝑥𝑖
𝑛
𝑖=1
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Linear Regression: Least Mean Square
Another way to find the values of model parameters that
minimize the error is the Ordinary Least Squares method. The
formulas for 𝜃1 and 𝜃2 in terms of the data points are:
𝜃2 =
𝜃1 = 𝑦ത − 𝜃2 𝑥ҧ
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Evaluating a Regression model
A variety of evaluation measures can be used to determine the
strength of any linear regression model. These assessment
metrics often give an indication of how well the model is
producing the observed outputs. Some of these are:
• R-squared method
• Mean Square Error (MSE)
• Mean Absolute Error (MAE)
• Root Mean Squared Error (RMSE)
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Evaluating a Regression model
• R-squared method: It is a statistical method that determines
the goodness of fit. It can be calculated from the below
formula:
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑅2 =
𝑇𝑜𝑡𝑎𝑙 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛
𝑆𝑆𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑆𝑆𝑟𝑒𝑠
𝑅2 = =1−
𝑆𝑆𝑡𝑜𝑡𝑎𝑙 𝑆𝑆𝑡𝑜𝑡𝑎𝑙
Courtesy: https://www.geeksforgeeks.org/
Classification & Regression
Graphical view
Linear Model
Mean Model
𝑌 = 𝜃1 + 𝜃2 X
σ𝑛𝑖=1 𝑦𝑖 − 𝑦ො𝑖 2
𝑅𝑀𝑆𝐸 =
𝑛
Classification & Regression
Example:
A real estate agent wishes to House Price Square
in $1000s Feet
examine the relationship between the (Y) (X)
selling price of a home and its size 245 1400
(measured in square feet). A random 312 1600
sample of 10 houses is selected. 279 1700
405 2350
350
324 2450
300 319 1425
250 255 1700
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet
Classification & Regression
Regression Statistics
Example: R Square 0.58082
Observations 10
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
= 98.25 + 0.1098(2000)
= 317.85
11 59
Salary
60
40
21 90
1 20
20
0
0 5 10 15 20 25
Years
Classification & Regression
Example-2:
X (years) Y (salary,
𝜃2 = $1,000)
3 30
𝜃1 = 𝑦ത − 𝜃2 𝑥ҧ 8 57
9 64
13 72
𝜃1 = 23.2 and 𝜃2 =3.5 3 36