0% found this document useful (0 votes)
31 views16 pages

MLT UNIT-2 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views16 pages

MLT UNIT-2 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT -2

 Regression:

Regression analysis is a statistical method to model the


relationship between a dependent (target) and independent
(predictor) variables with one or more independent variables.
More specifically, Regression analysis helps us to understand how
the value of the dependent variable is changing corresponding to
an independent variable when other independent variables are
held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors


o Determining Market trends
o Prediction of road accidents due to rash driving.

 Types Of Regression:
Linear Regression:
o Linear regression is a statistical regression method which is
used for predictive analysis.
o It is one of the very simple and easy algorithms which works
on regression and shows the relationship between the
continuous variables.
o It is used for solving the regression problem in machine
learning.
o Linear regression shows the linear relationship between the
independent variable (X-axis) and the dependent variable (Y-
axis), hence called linear regression.
o If there is only one input variable (x), then such linear
regression is called simple linear regression. And if there
is more than one input variable, then such linear regression
is called multiple linear regression.

Below is the mathematical equation for Linear regression:

1. Y= aX+b

Here, Y = dependent variables (target variables),


X= Independent variables (predictor variables),
a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates


o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:
o Logistic regression is another supervised learning algorithm
which is used to solve the classification problems.
In classification problems, we have dependent variables
in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical
variable such as 0 or 1, Yes or No, True or False, Spam or not
spam, etc.
o It is a predictive analysis algorithm which works on the
concept of probability.
o Logistic regression is a type of regression, but it is different
from the linear regression algorithm in the term how they are
used.
o Logistic regression uses sigmoid function or logistic
function which is a complex cost function. This sigmoid
function is used to model the data in logistic regression. The
function can be represented as:

o f(x)= Output between the 0 and 1 value.


o x= input to the function
o e= base of natural logarithm.

 Bayesian Learning:

BL in ML is an approach that uses Bayesian probability


theory to model and make prediction about data.

Here are some key aspects of BL in ML are Probability


distribution, Baye’s Theorem , Parameter Estimation,
Decision making, Model Selection, handles small data.

 Bayes Classifier :

BC is a type of probabilistics classifier that uses Bayes


theorem to make prediction. It indicates the probability of
a data point belonging to a specific class and them
selects the class with the highest probability.

There are several types of Bayes Classifier-Naïve Bayes


and Bayesian Network.
 Bayes Theorem:

Bayes theorem is also known as the Bayes Rule or Bayes


Law. It is used to determine the conditional probability of
event A when event B has already happened. The general
statement of Bayes’ theorem is “The conditional
probability of an event A, given the occurrence of another
event B, is equal to the product of the event of B, given A
and the probability of A divided by the probability of
event B.” i.e.

P(A|B) = P(B|A) P(A) / P(B)

where,

P(A) and P(B) are the probabilities of events A and B

P(A|B) is the probability of event A when event B


happens.

P(B|A) is the probability of event B when A happens.

 Naïve Bayes Theorem:

o Naïve Bayes algorithm is a supervised learning algorithm,


which is based on Bayes theorem and used for solving
classification problems.
o It is mainly used in text classification that includes a high-
dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most
effective Classification algorithms which helps in building the
fast machine learning models that can make quick
predictions.
o It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the
below example:

Suppose we have a dataset of weather conditions and corresponding


target variable "Play". So, using this dataset we need to decide whether we
should play or not on a particular day according to the weather conditions.
So, to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate a Likelihood table by finding the probabilities of given
features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:


Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes
Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes
Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny) = P(Sunny|Yes) *P(Yes)/P(Sunny)

P(Sunny|Yes) = 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So, P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny) = P(Sunny|No) *P(No)/P(Sunny)

P(Sunny|NO) = 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So, P(No|Sunny) = 0.5*0.29/0.35 = 0.41

So, as we can see from the above calculation that P(Yes|Sunny)>P(No|


Sunny)

Hence on a Sunny day, Player can play the game.

 Support Vector machine:

Support Vector Machine or SVM is one of the most popular


Supervised Learning algorithms, which is used for Classification as
well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.

The goal of the SVM algorithm is to create the best line or


decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the correct
category in the future. This best decision boundary is called a
hyperplane.

Types of support vector machines


Support vector machines have different types and variants that provide specific
functionalities and address specific problem scenarios. Here are two types of
SVMs and their significance:

1. Linear SVM. Linear SVMs use a linear kernel to create a straight-line


decision boundary that separates different classes. They are effective
when the data is linearly separable or when a linear approximation is
sufficient. Linear SVMs are computationally efficient and have good
interpretability, as the decision boundary is a hyperplane in the input
feature space.
2. Nonlinear SVM. Nonlinear SVMs address scenarios where the data
cannot be separated by a straight line in the input feature space. They
achieve this by using kernel functions that implicitly map the data into a
higher-dimensional feature space, where a linear decision boundary can
be found. Popular kernel functions used in this type of SVM include the
polynomial kernel, Gaussian (RBF) kernel and sigmoid kernel. Nonlinear
SVMs can capture complex patterns and achieve higher classification
accuracy when compared to linear SVMs.

 Kernel Function:
Kernel Function is a method used to take data as input and transform it
into the required form of processing data.
“Kernel” is used due to a set of mathematical functions used in Support
Vector Machine providing the window to manipulate the data.
So, Kernel Function generally transforms the training set of data so that
a non-linear decision surface is able to transform to a linear equation in
a higher number of dimension spaces. Basically, It returns the inner
product between two points in a standard feature dimension.
Types Of kernel:
Linear Kernel
A linear kernel is a type of kernel function used in machine
learning, including in SVMs (Support Vector Machines). It is the
simplest and most commonly used kernel function, and it defines
the dot product between the input vectors in the original feature
space.

The linear kernel can be defined as:

1. K(x, y) = x .y

Where x and y are the input feature vectors. The dot product of
the input vectors is a measure of their similarity or distance in the
original feature space.
Gaussian (RBF) Kernel
The Gaussian kernel, also known as the radial basis function (RBF)
kernel, is a popular kernel function used in machine learning,
particularly in SVMs (Support Vector Machines). It is a nonlinear
kernel function that maps the input data into a higher-
dimensional feature space using a Gaussian function.

The Gaussian kernel can be defined as:

1. K(x, y) = exp(-gamma * ||x - y||^2)

Where x and y are the input feature vectors, gamma is a


parameter that controls the width of the Gaussian function, and ||
x - y||^2 is the squared Euclidean distance between the input
vectors.

Polynomial Kernel:
It represents the similarity of vectors in the training set of data in a
feature space over polynomials of the original variables used in the
kernel.
K (x, y) = (x.y + c) d
Advantages of SVMs
SVMs are powerful machine learning algorithms that have the following
advantages:

 Effective in high-dimensional spaces. High-dimensional data refers to


data in which the number of features is larger than the number of
observations, i.e., data points. SVMs perform well even when the
number of features is larger than the number of samples. They can
handle high-dimensional data efficiently, making them suitable for
applications with a large number of features.
 Resistant to overfitting. SVMs are less prone to overfitting compared to
other algorithms, like decision trees -- overfitting is where a model
performs extremely well on the training data but becomes too specific
to that data and can't generalize to new data. SVMs' use of the margin
maximization principle helps in generalizing well to unseen data.
 Versatile. SVMs can be applied to both classification and regression
problems. They support different kernel functions, enabling flexibility in
capturing complex relationships in the data. This versatility makes SVMs
applicable to a wide range of tasks.
 Effective in cases of limited data. SVMs can work well even when the
training data set is small. The use of support vectors ensures that only a
subset of data points influences the decision boundary, which can be
beneficial when data is limited.
 Ability to handle nonlinear data. SVMs can implicitly handle non-
linearly separable data by using kernel functions. The kernel trick
enables SVMs to transform the input space into a higher-dimensional
feature space, making it possible to find linear decision boundaries.
Disadvantages of SVMs
While support vector machines are popular for the reasons listed above, they also
come with some limitations and potential issues:

 Computationally intensive. SVMs can be computationally expensive,


especially when dealing with large data sets. The training time and
memory requirements increase significantly with the number of training
samples.
 Sensitive to parameter tuning. SVMs have parameters such as the
regularization parameter and the choice of kernel function. The
performance of SVMs can be sensitive to these parameter settings.
Improper tuning can lead to suboptimal results or longer training times.
 Lack of probabilistic outputs. SVMs provide binary classification outputs
and do not directly estimate class probabilities. Additional techniques,
such as Platt scaling or cross-validation, are needed to obtain probability
estimates.
 Difficulty in interpreting complex models. SVMs can create complex
decision boundaries, especially when using nonlinear kernels. This
complexity may make it challenging to interpret the model and
understand the underlying patterns in the data.
 Scalability issues. SVMs may face scalability issues when applied to
extremely large data sets. Training an SVM on millions of samples can
become impractical due to memory and computational constraints.

 Bayesian Belief Network:


A Bayesian Belief Network (BBN), also known as a Bayesian
Network or a Probabilistic Graphical Model, is a graphical
representation of probabilistic relationships among a set of
variables.

BBNs are used in machine learning and artificial intelligence to


model and reason about uncertainty, make predictions, and
perform probabilistic inference. They are particularly useful in
situations where there is uncertainty or incomplete information
about a system.

Bayesian Network can be used for building models from data and
experts’ opinions, and it consists of two parts:

o Directed Acyclic Graph


o Table of conditional probabilities.

 EM Algorithm:
The EM algorithm is the combination of various unsupervised ML algorithms,
such as the k-means clustering algorithm. Being an iterative approach, it
consists of two modes. In the first mode, we estimate the missing or latent
variables. Hence it is referred to as the Expectation/estimation step (E-
step). Further, the other mode is used to optimize the parameters of the
models so that it can explain the data more clearly. The second mode is
known as the maximization-step or M-step.
o Expectation step (E - step): It involves the estimation (guess) of all
missing values in the dataset so that after completing this step, there should
not be any missing value.
o Maximization step (M - step): This step involves the use of estimated data
in the E-step and updating the parameters.
o Repeat E-step and M-step until the convergence of the values occurs.

The primary goal of the EM algorithm is to use the available observed data of
the dataset to estimate the missing data of the latent variables and then use
that data to update the values of the parameters in the M-step.

Steps in EM Algorithm:
The EM algorithm is completed mainly in 4 steps, which include
Initialization Step, Expectation Step, Maximization Step, and
convergence Step. These steps are explained as follows:
o 1st Step: The very first step is to initialize the parameter values. Further, the
system is provided with incomplete observed data with the assumption that
data is obtained from a specific model.

o 2nd Step: This step is known as Expectation or E-Step, which is used to


estimate or guess the values of the missing or incomplete data using the
observed data. Further, E-step primarily updates the variables.
o 3rd Step: This step is known as Maximization or M-step, where we use
complete data obtained from the 2nd step to update the parameter values.
Further, M-step primarily updates the hypothesis.
o 4th step: The last step is to check if the values of latent variables are
converging or not. If it gets "yes", then stop the process; else, repeat the
process from step 2 until the convergence occurs.

 Applications of EM algorithm:
The primary aim of the EM algorithm is to estimate the missing data in the
latent variables through observed data in datasets. The EM algorithm or
latent variable model has a broad range of real-life applications in machine
learning. These are as follows:

o The EM algorithm is applicable in data clustering in machine learning.


o It is often used in computer vision and NLP (Natural language
processing).
o It is used to estimate the value of the parameter in mixed models such
as the Gaussian Mixture Model and quantitative genetics.
o It is also used in psychometrics for estimating item parameters and
latent abilities of item response theory models.
o It is also applicable in the medical and healthcare industry, such as in
image reconstruction and structural engineering.
o It is used to determine the Gaussian density of a function.

Advantages of EM algorithm:

o It is very easy to implement the first two basic steps of the EM


algorithm in various machine learning problems, which are E-step and
M- step.
o It is mostly guaranteed that the likelihood will enhance after each
iteration.
o It often generates a solution for the M-step in the closed form.

Disadvantages of EM algorithm:

o The convergence of the EM algorithm is very slow.


o It can make convergence for the local optima only.
o It takes both forward and backward probability into consideration. It is
the opposite of numerical optimization, which takes only forward
probabilities.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy