0% found this document useful (0 votes)

91 views20 pages

Em Algorithm

The Expectation-Maximization algorithm is an iterative method for finding maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables. It alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables, and a maximization (M) step, which computes parameter estimates maximizing the expected likelihood found on the E step. This process is repeated until convergence. The algorithm is useful for problems with missing or hidden data like clustering and estimating Hidden Markov Models. It guarantees an increase in likelihood with each iteration and has easy to implement E and M steps for many problems. However, it can be slow to converge and only finds local optima.

Uploaded by

jana k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views20 pages

Em Algorithm

Uploaded by

jana k

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 20

Expectation-Maximization

Algorithm
In the real-world applications of machine learning, it is very
common that there are many relevant features available for
learning but only a small subset of them are observable. So, for
the variables which are sometimes observable and sometimes
not, then we can use the instances when that variable is visible
is observed for the purpose of learning and then predict its value
in the instances when it is not observable. On the other
hand, Expectation-Maximization algorithm can be used for
the latent variables (variables that are not directly observable
and are actually inferred from the values of the other observed
variables) too in order to predict their values with the condition
that the general form of probability distribution governing those
latent variables is known to us. This algorithm is actually at the
base of many unsupervised clustering algorithms in the field of
machine learning.
It was explained, proposed and given its name in a paper
published in 1977 by Arthur Dempster, Nan Laird, and Donald
Rubin. It is used to find the local maximum likelihood
parameters of a statistical model in the cases where latent
variables are involved and the data is missing or incomplete.
Algorithm:
1. Given a set of incomplete data, consider a set of starting
parameters.
2. Expectation step (E – step): Using the observed
available data of the dataset, estimate (guess) the values of
the missing data.
3. Maximization step (M – step): Complete data generated
after the expectation (E) step is used in order to update the
parameters.
4. Repeat step 2 and step 3 until convergence.

The essence of Expectation-Maximization algorithm is to use the

available observed data of the dataset to estimate the missing
data and then using that data to update the values of the
parameters. Let us understand the EM algorithm in detail.
 Initially, a set of initial values of the parameters are
considered. A set of incomplete observed data is given to the
system with the assumption that the observed data comes
from a specific model.
 The next step is known as “Expectation” – step or E-step.
In this step, we use the observed data in order to estimate or
guess the values of the missing or incomplete data. It is
basically used to update the variables.
 The next step is known as “Maximization”-step or M-step.
In this step, we use the complete data generated in the
preceding “Expectation” – step in order to update the values
of the parameters. It is basically used to update the
hypothesis.
 Now, in the fourth step, it is checked whether the values
are converging or not, if yes, then stop otherwise repeat step-
2 and step-3 i.e. “Expectation” – step and “Maximization” –
step until the convergence occurs.
 Flow chart for EM algorithm –

Usage of EM algorithm –
 It can be used to fill the missing data in a sample.
 It can be used as the basis of unsupervised learning of
clusters.
 It can be used for the purpose of estimating the
parameters of Hidden Markov Model (HMM).
 It can be used for discovering the values of latent
variables.
Advantages of EM algorithm –
 It is always guaranteed that likelihood will increase with
each iteration.
 The E-step and M-step are often pretty easy for many
problems in terms of implementation.
 Solutions to the M-steps often exist in the closed form.

Disadvantages of EM algorithm –
 It has slow convergence.
 It makes convergence to the local optima only.
 It requires both the probabilities, forward and backward
(numerical optimization requires only forward probability).

Bayesian Belief Network

Bayesian Belief Network is a graphical representation of
different probabilistic relationships among random variables in a
particular set. It is a classifier with no dependency on attributes
i.e it is condition independent. Due to its feature of joint
probability, the probability in Bayesian Belief Network is derived,
based on a condition — P(attribute/parent) i.e probability of an
attribute, true over parent attribute.
 Consider this example:

 In the above figure, we have an alarm ‘A’ – a node, say

installed in a house of a person ‘gfg’, which rings upon two
probabilities i.e burglary ‘B’ and fire ‘F’, which are – parent
nodes of the alarm node. The alarm is the parent node of two
probabilities P1 calls ‘P1’ & P2 calls ‘P2’ person nodes.
 Upon the instance of burglary and fire, ‘P1’ and ‘P2’ call
person ‘gfg’, respectively. But, there are few drawbacks in
this case, as sometimes ‘P1’ may forget to call the person
‘gfg’, even after hearing the alarm, as he has a tendency to
forget things, quick. Similarly, ‘P2’, sometimes fails to call the
person ‘gfg’, as he is only able to hear the alarm, from a
certain distance.
Q) Find the probability that ‘P1’ is true (P1 has called ‘gfg’), ‘P2’
is true (P2 has called ‘gfg’) when the alarm ‘A’ rang, but no
burglary ‘B’ and fire ‘F’ has occurred.
=> P ( P1, P2, A, ~B, ~F) [ where- P1, P2 & A are ‘true’ events
and ‘~B’ & ‘~F’ are ‘false’ events]
Burglary ‘B’ –
 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)
 P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)
Fire ‘F’ –
 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)
 P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)
Alarm ‘A’ –
B F P (A=T) P (A=F)

T T 0.95 0.05

T F 0.94 0.06

F T 0.29 0.71

F F 0.001 0.999

 The alarm ‘A’ node can be ‘true’ or ‘false’ ( i.e may have
rung or may not have rung). It has two parent nodes burglary
‘B’ and fire ‘F’ which can be ‘true’ or ‘false’ (i.e may have
occurred or may not have occurred) depending upon different
conditions.
 Person ‘P1’ –

A P (P1=T) P (P1=F)

T 0.95 0.05

F 0.05 0.95

 The person ‘P1’ node can be ‘true’ or ‘false’ (i.e may have
called the person ‘gfg’ or not) . It has a parent node, the
alarm ‘A’, which can be ‘true’ or ‘false’ (i.e may have rung or
may not have rung ,upon burglary ‘B’ or fire ‘F’).

Person ‘P2’ –
A P (P2=T) P (P2=F)

T 0.80 0.20
F 0.01 0.99

 The person ‘P2’ node can be ‘true’ or false’ (i.e may have
called the person ‘gfg’ or not). It has a parent node, the alarm
‘A’, which can be ‘true’ or ‘false’ (i.e may have rung or may
not have rung, upon burglary ‘B’ or fire ‘F’).
Solution: Considering the observed probabilistic scan –
With respect to the question — P ( P1, P2, A, ~B, ~F) , we need
to get the probability of ‘P1’. We find it with regard to its parent
node – alarm ‘A’. To get the probability of ‘P2’, we find it with
regard to its parent node — alarm ‘A’.
We find the probability of alarm ‘A’ node with regard to ‘~B’ &
‘~F’ since burglary ‘B’ and fire ‘F’ are parent nodes of alarm ‘A’.
From the observed probabilistic scan, we can deduce –
P ( P1, P2, A, ~B, ~F)
= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)
= 0.95 * 0.80 * 0.001 * 0.999 * 0.998
= 0.00075

Naïve Bayes Classifier

o Naïve Bayes algorithm is a supervised learning algorithm, which

is based on Bayes theorem and used for solving classification
problems.
o It is mainly used in text classification that includes a high-
dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast
machine learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.
Why is it called Naïve Bayes?
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence

of a certain feature is independent of the occurrence of other
features. Such as if the fruit is identified on the bases of color,
shape, and taste, then red, spherical, and sweet fruit is
recognized as an apple. Hence each feature individually
contributes to identify that it is an apple without depending on
each other.
o Bayes: It is called Bayes because it depends on the principle
of Bayes' Theorem.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law,
which is used to determine the probability of a hypothesis with
prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the

observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given

that the probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing

the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:
Working of Naïve Bayes' Classifier can be understood with the help of
the below example:

Suppose we have a dataset of weather conditions and corresponding

target variable "Play". So using this dataset we need to decide that
whether we should play or not on a particular day according to the
weather conditions. So to solve this problem, we need to follow the
below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given
features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes
8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|

Sunny)

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict

a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the
other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.

o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes
Classifier is an eager learner.
o It is used in Text classification such as Spam
filtering and Sentiment analysis.

Types of Naïve Bayes Model:

There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a

normal distribution. This means if predictors take continuous
values instead of discrete, then the model assumes that these
values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used
when the data is multinomial distributed. It is primarily used for
document classification problems, it means a particular
document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the
Multinomial classifier, but the predictor variables are the
independent Booleans variables. Such as if a particular word is
present or not in a document. This model is also famous for
document classification tasks.

Bayes Theorem
Machine Learning is one of the most emerging technology of Artificial
Intelligence. We are living in the 21th century which is completely
driven by new technologies and gadgets in which some are yet to be
used and few are on its full potential. Similarly, Machine Learning is also
a technology that is still in its developing phase. There are lots of
concepts that make machine learning a better technology such as
supervised learning, unsupervised learning, reinforcement learning,
perceptron models, Neural networks, etc. In this article "Bayes Theorem
in Machine Learning", we will discuss another most important concept
of Machine Learning theorem i.e., Bayes Theorem. But before starting
this topic you should have essential understanding of this theorem
such as what exactly is Bayes theorem, why it is used in Machine
Learning, examples of Bayes theorem in Machine Learning and much
more. So, let's start the brief introduction of Bayes theorem.

Introduction to Bayes Theorem in Machine

Learning
Bayes theorem is given by an English statistician, philosopher, and
Presbyterian minister named Mr. Thomas Bayes in 17th century. Bayes
provides their thoughts in decision theory which is extensively used in
important mathematics concepts as Probability. Bayes theorem is also
widely used in Machine Learning where we need to predict classes
precisely and accurately. An important concept of Bayes theorem
named Bayesian method is used to calculate conditional probability in
Machine Learning application that includes classification tasks. Further,
a simplified version of Bayes theorem (Naïve Bayes classification) is also
used to reduce computation time and average cost of the projects.

Bayes theorem is also known with some other name such as Bayes rule
or Bayes Law. Bayes theorem helps to determine the probability of
an event with random knowledge. It is used to calculate the
probability of occurring one event while other one already occurred. It
is a best method to relate the condition probability and marginal
probability.
In simple words, we can say that Bayes theorem helps to contribute
more accurate results.

Bayes Theorem is used to estimate the precision of values and provides

a method for calculating the conditional probability. However, it is
hypocritically a simple calculation but it is used to easily calculate the
conditional probability of events where intuition often fails. Some of
the data scientist assumes that Bayes theorem is most widely used in
financial industries but it is not like that. Other than financial, Bayes
theorem is also extensively applied in health and medical, research and
survey industry, aeronautical sector, etc.

What is Bayes Theorem?

Bayes theorem is one of the most popular machine learning concepts
that helps to calculate the probability of occurring one event with
uncertain knowledge while other one has already occurred.

Bayes' theorem can be derived using product rule and conditional

probability of event X with known event Y:

o According to the product rule we can express as the probability

of event X with known event Y as follows;

P(X ? Y)= P(X|Y) P(Y) {equation 1}
o Further, the probability of event Y with known event X:

P(X ? Y)= P(Y|X) P(X) {equation 2}

Mathematically, Bayes theorem can be expressed by combining both

equations on right hand side. We will get:
Here, both events X and Y are
independent events which means probability of outcome of both
events does not depends one another.

The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is

defined as updated probability after considering the evidence.
o P(Y|X) is called the likelihood. It is the probability of evidence
when hypothesis is true.
o P(X) is called the prior probability, probability of hypothesis
before considering the evidence
o P(Y) is called marginal probability. It is defined as the probability
of evidence under any consideration.

Hence, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence

Prerequisites for Bayes Theorem

While studying the Bayes theorem, we need to understand few
important concepts. These are as follows:

1. Experiment

An experiment is defined as the planned operation carried out under

controlled condition such as tossing a coin, drawing a card and rolling
a dice, etc.

2. Sample Space
During an experiment what we get as a result is called as possible
outcomes and the set of all possible outcome of an event is known as
sample space. For example, if we are rolling a dice, sample space will
be:

S1 = {1, 2, 3, 4, 5, 6}

Similarly, if our experiment is related to toss a coin and recording its

outcomes, then sample space will be:

S2 = {Head, Tail}

3. Event

Event is defined as subset of sample space in an experiment. Further, it

is also called as set of outcomes.

Assume in our experiment of rolling a dice, there are two event A and B
such that;
A = Event when an even number is obtained = {2, 4, 6}

B = Event when a number is greater than 4 = {5, 6}

o Probability of the event A ''P(A)''= Number of favourable

outcomes / Total number of possible outcomes
P(E) = 3/6 =1/2 =0.5
o Similarly, Probability of the event B ''P(B)''= Number of
favourable outcomes / Total number of possible outcomes
=2/6
=1/3
=0.333
o Union of event A and B:
A∪B = {2, 4, 5, 6}
o Intersection of event A and B:
A∩B= {6}

o Disjoint Event: If the intersection of the event A and B is an

empty set or null then such events are known as disjoint
event or mutually exclusive events also.

4. Random Variable:
It is a real value function which helps mapping between sample space
and a real line of an experiment. A random variable is taken on some
random values and each value having some probability. However, it is
neither random nor a variable but it behaves as a function which can
either be discrete, continuous or combination of both.

5. Exhaustive Event:

As per the name suggests, a set of events where at least one event
occurs at a time, called exhaustive event of an experiment.

Thus, two events A and B are said to be exhaustive if either A or B

definitely occur at a time and both are mutually exclusive for e.g., while
tossing a coin, either it will be a Head or may be a Tail.

6. Independent Event:

Two events are said to be independent when occurrence of one event

does not affect the occurrence of another event. In simple words we
can say that the probability of outcome of both events does not
depends one another.

Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

7. Conditional Probability:

Conditional probability is defined as the probability of an event A,

given that another event B has already occurred (i.e. A conditional B).
This is represented by P(A|B) and we can define it as:

P(A|B) = P(A ∩ B) / P(B)

8. Marginal Probability:

Marginal probability is defined as the probability of an event A

occurring independent of any other event B. Further, it is considered as
the probability of evidence under any consideration.
P(A) = P(A|B)*P(B) + P(A|~B)*P(~B)

Here ~B represents the event that B does not occur.

How to apply Bayes Theorem or Bayes rule

in Machine Learning?
Bayes theorem helps us to calculate the single term P(B|A) in terms of
P(A|B), P(B), and P(A). This rule is very helpful in such scenarios where
we have a good probability of P(A|B), P(B), and P(A) and need to
determine the fourth term.

Naïve Bayes classifier is one of the simplest applications of Bayes

theorem which is used in classification algorithms to isolate data as per
accuracy, speed and classes.

Let's understand the use of Bayes theorem in machine learning with

below example.

Suppose, we have a vector A with I attributes. It means

A = A1, A2, A3, A4……………Ai

Further, we have n classes represented as C1, C2, C3, C4…………Cn.

These are two conditions given to us, and our classifier that works on
Machine Language has to predict A and the first thing that our classifier
has to choose will be the best possible class. So, with the help of Bayes
theorem, we can write it as:

P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A)

Here;

P(A) is the condition-independent entity.

P(A) will remain constant throughout the class means it does not
change its value with respect to change in class. To maximize the
P(Ci/A), we have to maximize the value of term P(A/Ci) * P(Ci).

With n number classes on the probability list let's assume that the
possibility of any class being the right answer is equally likely.
Considering this factor, we can say that:

P(C1)=P(C2)-P(C3)=P(C4)=…..=P(Cn).

This process helps us to reduce the computation cost as well as time.

This is how Bayes theorem plays a significant role in Machine Learning
and Naïve Bayes theorem has simplified the conditional probability
tasks without affecting the precision. Hence, we can conclude that:

P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)……P(An/C)

Hence, by using Bayes theorem in Machine Learning we can easily

describe the possibilities of smaller events.

MLT Unit 4 Notes
No ratings yet
MLT Unit 4 Notes
26 pages
Unit-V POAI
No ratings yet
Unit-V POAI
50 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
Machine Learning-2
No ratings yet
Machine Learning-2
16 pages
Bayesian Network
No ratings yet
Bayesian Network
32 pages
Data Mining Micro PGDM
No ratings yet
Data Mining Micro PGDM
40 pages
EM Missing
No ratings yet
EM Missing
25 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
46 pages
Bays Theorem
No ratings yet
Bays Theorem
42 pages
ML Material-I
No ratings yet
ML Material-I
35 pages
Unit II AI
No ratings yet
Unit II AI
43 pages
Unit 2
No ratings yet
Unit 2
20 pages
ExpectationMaximization Algorithm
No ratings yet
ExpectationMaximization Algorithm
7 pages
Sem-Vi - Comp - Regular - Ai - May-2023.paper Solution
No ratings yet
Sem-Vi - Comp - Regular - Ai - May-2023.paper Solution
14 pages
Unit 4
No ratings yet
Unit 4
36 pages
ML Unit Iv
No ratings yet
ML Unit Iv
17 pages
Lecture-8 Machine Learning With Python
No ratings yet
Lecture-8 Machine Learning With Python
35 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Baye's Theorem
No ratings yet
Baye's Theorem
2 pages
Fairness Lectures-21
No ratings yet
Fairness Lectures-21
63 pages
Module 3
No ratings yet
Module 3
9 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
JBTS 3.1 Compressed
No ratings yet
JBTS 3.1 Compressed
231 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Medical Studies at The University of Santo Tomas (UST)
100% (1)
Medical Studies at The University of Santo Tomas (UST)
31 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Unit Iv Learning
No ratings yet
Unit Iv Learning
40 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
23 pages
WK 08
No ratings yet
WK 08
10 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
CpE646 6v3 PDF
No ratings yet
CpE646 6v3 PDF
44 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
AI&ML-Q With Answer
No ratings yet
AI&ML-Q With Answer
18 pages
Unit I - Electric Circuits
No ratings yet
Unit I - Electric Circuits
89 pages
Uncertain Knowledge
No ratings yet
Uncertain Knowledge
31 pages
Module 2 - Bayesian Learning
No ratings yet
Module 2 - Bayesian Learning
7 pages
Bayesian Learning Note
No ratings yet
Bayesian Learning Note
20 pages
Unit 2
No ratings yet
Unit 2
7 pages
Aiml Iii
No ratings yet
Aiml Iii
28 pages
AI & ML Unit 2 Notes
No ratings yet
AI & ML Unit 2 Notes
12 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
EXP13
No ratings yet
EXP13
9 pages
Unit 1
No ratings yet
Unit 1
257 pages
Bayesian Theory - Bayesian Network - Dempster Shafer Theory-AI Seminar
No ratings yet
Bayesian Theory - Bayesian Network - Dempster Shafer Theory-AI Seminar
21 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
3 pages
Certificate: Jawahar Navodaya Vidyalaya
No ratings yet
Certificate: Jawahar Navodaya Vidyalaya
13 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Aiml Unit 2
No ratings yet
Aiml Unit 2
15 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Bayes Nets Representing and Reasoning About Uncertainty (Continued)
No ratings yet
Bayes Nets Representing and Reasoning About Uncertainty (Continued)
31 pages
Bayes Theorem
No ratings yet
Bayes Theorem
20 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Internship Report
No ratings yet
Internship Report
33 pages
AI Bayes Theorem
No ratings yet
AI Bayes Theorem
10 pages
Bayesian Belief Network
No ratings yet
Bayesian Belief Network
6 pages
The Fit of Hollands RIASEC Model To US Occupation
No ratings yet
The Fit of Hollands RIASEC Model To US Occupation
23 pages
S3 Food and Nutrition
No ratings yet
S3 Food and Nutrition
66 pages
Unit-5 Bayes' Rule and Bayesian Network
No ratings yet
Unit-5 Bayes' Rule and Bayesian Network
9 pages
Janlloyd Dugo - HOME ROOM GUIDANCE MODULE 1
93% (15)
Janlloyd Dugo - HOME ROOM GUIDANCE MODULE 1
2 pages
Degradation of Cellulose Derivatives in Laboratory
No ratings yet
Degradation of Cellulose Derivatives in Laboratory
17 pages
HRM Unit 1
No ratings yet
HRM Unit 1
74 pages
HRM Unit 2
No ratings yet
HRM Unit 2
65 pages
Employability Skills Plumbing Level 5 - Exams
No ratings yet
Employability Skills Plumbing Level 5 - Exams
7 pages
Ielts
No ratings yet
Ielts
4 pages
Bayesian Belief Network
100% (1)
Bayesian Belief Network
7 pages
Unit 3
No ratings yet
Unit 3
66 pages
Unit 5
No ratings yet
Unit 5
56 pages
Unit 5
No ratings yet
Unit 5
46 pages
Stcgan Shadow
No ratings yet
Stcgan Shadow
10 pages
Signature Analysis of UDP Streams For Intrusion Detection Using Data Mining Algorithms
No ratings yet
Signature Analysis of UDP Streams For Intrusion Detection Using Data Mining Algorithms
5 pages
Table of Contents - ML
No ratings yet
Table of Contents - ML
3 pages
Unit 2
No ratings yet
Unit 2
51 pages
1 20-Deswik OPS-Foundations
No ratings yet
1 20-Deswik OPS-Foundations
1 page
Air Canada SMS
No ratings yet
Air Canada SMS
42 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
UNIT 4 - Analog Electronics
No ratings yet
UNIT 4 - Analog Electronics
38 pages
Detection of Attack in Underwater WSN Using DL Technique
No ratings yet
Detection of Attack in Underwater WSN Using DL Technique
24 pages
Change Management or Organization Development
No ratings yet
Change Management or Organization Development
3 pages
Class 1st English Annual 2025
No ratings yet
Class 1st English Annual 2025
5 pages
Bayesian Belief Network in Artificial Intelligence
No ratings yet
Bayesian Belief Network in Artificial Intelligence
10 pages
AFFECTIVEFACTORSCHART
0% (1)
AFFECTIVEFACTORSCHART
2 pages
CMP in Semiconductor Manufacturing
No ratings yet
CMP in Semiconductor Manufacturing
8 pages
Semi-Detailed Lesson Plan in English 8 - Second Grading I. Objectives
67% (6)
Semi-Detailed Lesson Plan in English 8 - Second Grading I. Objectives
2 pages
Research Methodology Term Report
No ratings yet
Research Methodology Term Report
4 pages
SABOTSY Angelina-ROADMAP
No ratings yet
SABOTSY Angelina-ROADMAP
6 pages
A Calculator May Not Be Used On This Part of The Examination
No ratings yet
A Calculator May Not Be Used On This Part of The Examination
11 pages
Crash Marklist
No ratings yet
Crash Marklist
1 page
Collaborative Learning Task#1
No ratings yet
Collaborative Learning Task#1
3 pages
REFERENCES - Organic Chemistry
No ratings yet
REFERENCES - Organic Chemistry
1 page
Practical Exam Instructions: Canadian Welding Bureau
No ratings yet
Practical Exam Instructions: Canadian Welding Bureau
4 pages
Survey Questionnaire: Statement Always Sometimes Often Never
No ratings yet
Survey Questionnaire: Statement Always Sometimes Often Never
4 pages
Variations of Love by Margaret Atwood
No ratings yet
Variations of Love by Margaret Atwood
3 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
MIS ASSIGNMENT 2: KEDA: SAP Implementation Q1. ERP Projects Are Expensive and Risky. Why Did Keda Embark On A ERP Implementation Project?
No ratings yet
MIS ASSIGNMENT 2: KEDA: SAP Implementation Q1. ERP Projects Are Expensive and Risky. Why Did Keda Embark On A ERP Implementation Project?
3 pages
Municipal Solid Waste Management Syllabus
No ratings yet
Municipal Solid Waste Management Syllabus
1 page
Grade 11 DLL Entreo Q1 Week 13
No ratings yet
Grade 11 DLL Entreo Q1 Week 13
3 pages
Reference EDC
No ratings yet
Reference EDC
1 page
Cse314 Advanced-computer-Architecture TH 1.10 Ac26
No ratings yet
Cse314 Advanced-computer-Architecture TH 1.10 Ac26
2 pages
Building Comprehension Through Explicit Teaching of Comprehension Strategies
No ratings yet
Building Comprehension Through Explicit Teaching of Comprehension Strategies
27 pages
Banners
No ratings yet
Banners
2 pages
04 - Vals Venezolano No.1 (A.lauro)
100% (1)
04 - Vals Venezolano No.1 (A.lauro)
2 pages
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
From Everand
Markov Models Supervised and Unsupervised Machine Learning: Mastering Data Science And Python
William Sullivan
2/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Em Algorithm

Uploaded by

Em Algorithm

Uploaded by

Expectation-Maximization

The essence of Expectation-Maximization algorithm is to use the

Bayesian Belief Network

 In the above figure, we have an alarm ‘A’ – a node, say

Naïve Bayes Classifier

o Naïve Bayes algorithm is a supervised learning algorithm, which

o Naïve: It is called Naïve because it assumes that the occurrence

P(A|B) is Posterior probability: Probability of hypothesis A on the

P(B|A) is Likelihood probability: Probability of the evidence given

P(A) is Prior Probability: Probability of hypothesis before observing

P(B) is Marginal Probability: Probability of Evidence.

Suppose we have a dataset of weather conditions and corresponding

1. Convert the given dataset into frequency tables.

Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions:

Likelihood table weather condition:

Overcast 0 5 5/14= 0.35

All 4/14=0.29 10/14=0.71

P(Sunny|Yes)= 3/10= 0.3

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|

Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict

Disadvantages of Naïve Bayes Classifier:

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.

Types of Naïve Bayes Model:

o Gaussian: The Gaussian model assumes that features follow a

Introduction to Bayes Theorem in Machine

Bayes Theorem is used to estimate the precision of values and provides

What is Bayes Theorem?

Bayes' theorem can be derived using product rule and conditional

o According to the product rule we can express as the probability

Mathematically, Bayes theorem can be expressed by combining both

The above equation is called as Bayes Rule or Bayes Theorem.

o P(X|Y) is called as posterior, which we need to calculate. It is

Hence, Bayes Theorem can be written as:

posterior = likelihood * prior / evidence

Prerequisites for Bayes Theorem

An experiment is defined as the planned operation carried out under

Similarly, if our experiment is related to toss a coin and recording its

Event is defined as subset of sample space in an experiment. Further, it

B = Event when a number is greater than 4 = {5, 6}

o Probability of the event A ''P(A)''= Number of favourable

o Disjoint Event: If the intersection of the event A and B is an

Thus, two events A and B are said to be exhaustive if either A or B

Two events are said to be independent when occurrence of one event

Mathematically, two events A and B are said to be independent if:

P(A ∩ B) = P(AB) = P(A)*P(B)

Conditional probability is defined as the probability of an event A,

P(A|B) = P(A ∩ B) / P(B)

Marginal probability is defined as the probability of an event A

Here ~B represents the event that B does not occur.

How to apply Bayes Theorem or Bayes rule

Naïve Bayes classifier is one of the simplest applications of Bayes

Let's understand the use of Bayes theorem in machine learning with

Suppose, we have a vector A with I attributes. It means

A = A1, A2, A3, A4……………Ai

Further, we have n classes represented as C1, C2, C3, C4…………Cn.

P(Ci/A)= [ P(A/Ci) * P(Ci)] / P(A)

P(A) is the condition-independent entity.

This process helps us to reduce the computation cost as well as time.

P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)*……*P(An/C)

Hence, by using Bayes theorem in Machine Learning we can easily

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

P(Ai/C)= P(A1/C)* P(A2/C)* P(A3/C)……P(An/C)