0% found this document useful (0 votes)
41 views5 pages

Unit Iii Bayesian Learning

The document provides an overview of Bayesian learning and probability theory. It defines basic probability notation and concepts like independence, Bayes' rule, and maximum likelihood hypotheses. It then discusses Bayesian learning approaches like the naive Bayes classifier and Bayesian belief networks. It also explains the EM algorithm and how Bayesian inference works by updating prior knowledge with new evidence using Bayes' theorem.

Uploaded by

bravejaya2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views5 pages

Unit Iii Bayesian Learning

The document provides an overview of Bayesian learning and probability theory. It defines basic probability notation and concepts like independence, Bayes' rule, and maximum likelihood hypotheses. It then discusses Bayesian learning approaches like the naive Bayes classifier and Bayesian belief networks. It also explains the EM algorithm and how Bayesian inference works by updating prior knowledge with new evidence using Bayes' theorem.

Uploaded by

bravejaya2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

UNIT III BAYESIAN LEARNING

Basic Probability Notation- Inference – Independence - Bayes’ Rule. Bayesian


Learning: Maximum Likelihood and Least Squared error hypothesis-Maximum
Likelihood hypotheses for predicting probabilities- Minimum description
Length principle -Bayes optimal classifier - Naïve Bayes classifier -Bayesian
Belief networks -EM algorithm.

Basic Probability Theory

Probability can be calculated by the number of times the event occurs divided
by the total number of possible outcomes. Let's suppose we tossed a coin, then the
probability of getting head as a possible outcome can be calculated as below
formula:

P (H) = Number of ways to head occur/ total number of possible outcomes

P (H) = ½

P (H) = 0.5

Where;

P (H) = Probability of occurring Head as outcome while tossing a coin.

Basic definitions and rules


probability of a statement A — denoted P (A) — is a real number between 0
and 1.
P (A) = 1 indicates absolute certainty that A is true,
P (A) = 0 indicates absolute certainty that A is false,
and values between 0 and 1 correspond to varying degrees of certainty.

Joint Probability:It tells the Probability of simultaneously occurring two random


events.

P(A ∩ B) = P(A). P(B)

Where;
P(A ∩ B) = Probability of occurring events A and B both.

P (A) = Probability of event A

P (B) = Probability of event B

Conditional Probability:It is given by the Probability of event A given that event B


occurred.

The Probability of an event A conditioned on an event B is denoted and defined as;

P(A|B) = P(A∩B)/P(B)

Similarly, P(B|A) = P(A ∩ B)/ P(A) . We can write the joint Probability of as A and B as
P(A ∩ B)= p(A).P(B|A), which means: "The chance of both things happening is the
chance that the first one happens, and then the second one is given when the first
thing happened."

The basic rules of probability theory:

• P (A) ∈ [0...1]

•Product rule: P (A, B) = P(A|B)P (B)

• Sum rule: P(A) + P(A ̄ ) = 1

• Two statements A and B are independent if: P(A, B) = P(A)P (B)

. Marginalizing: P (B) = (sigma symbol) i P (Ai, B)

• Any basic rule can be made conditional on additional information.

For example, it follows from the product rule that P (A, B|C) = P (A|B, C)P (B|C)

Independence

 Two events are said to be independent if the probability of their


intersection is equal to the product of their individual probability,
i.e. P(AB) = P(A).P(B)

 This is possible because the conditional probability of A given B is


the same as the probability of A , i.e. P(A|B) = P(A) .
Similarly, P(B|A) = P(B).
 This means that A is likely to happen in the set of all events as it is
the domain of B. Similarly B is likely to happen in the set of all
events as it is the domain of A.

When two events are independent, neither of the events is


influenced by the fact that the other event has happened.

Bayes Rule

we do apply Bayes Rule to find out solution i.e. the parameter vectors.So
what is this Bayes Rule? To understand this let us take an example.Let us
take two events A and B ,

P(A) →Probability of event A occurring

P(B) →Probability of event B occurring

P(AB) →Probability of event A and B occurring simultaneously

P(A|B) → Probability of event A occurring given that event B has already


occurred

P(B|A) → Probability of event B occurring given that event A has already


occurred

Now from section 2 we have the following :

P(AB) = P(A).P(B|A) ….(1)

P(AB) = P(B).P(A|B) ….(2)

Combining equations (1) and (2) we get,

P(A).P(B|A) = P(B).P(A|B)

⇒ P(A|B) = (P(A).P(B|A)) ÷P(B)

Thus it describes the probability of an event, based on prior knowledge of

conditions that might be related to the event. For example, if cancer is


related to age, then, using Bayes’ theorem, a person’s age can be used to

more accurately assess the probability that they have cancer, compared to

the assessment of the probability of cancer made without knowledge of the

person’s age.

Bayesian Inference

Bayesian inference is a technique in machine learning that enables algorithms to


make predictions by updating their prior knowledge based on new evidence using
Bayes' theorem.

But what is Bayes' theorem?

It describes the probabilities of event A, given that another event, B, has occurred. The
formula is given below:

Bayes Theorem

P(A): prior → the probability that event A occurs without any knowledge of other events.

P(B): the normalizing constant that allows the computation of the posterior event B.

P(A|B): posterior → the probability of event A occurring, given another event B occurs.

P(B|A): likelihood → the probability of event B occurring if A occurs.

Advantages of Bayesian Inference


Bayesian inference has multiple advantages, and not limited to:

1. Flexibility. Bayesian inference can be applied to both linear and non-linear


models and various machine learning problems such as regression,
classification, clustering, natural language processing and more.
2. More intuitive. The transition from prior to posterior knowledge using new
data is similar to the way human beings update their beliefs using new
information, which is more intuitive. For instance, if someone has the prior
knowledge that it will snow tomorrow, and checks the weather information,
that person will change that belief accordingly.
3. Interpretable. The probability distributions generated over the possible
values of the model predictions can be easily interpreted, which can help
decision-makers make decisions with respect to their tolerance for risk.
Applications of Bayesian Inference in Machine Learning
Below are some of the real-world applications of Bayes inference in machine learning:

1. Credit card fraud detection. Bayes inference can be applied to detect


fraudulent activities. The process begins with a prior belief about the
probability of a transaction being fraudulent based on historical data. Then,
as new data becomes available, such as the transaction amount, or the
customer's purchase history, the prior information is updated using Bayes'
theorem to obtain a posterior probability of a transaction; to determine
whether it is fraudulent or not.
2. Medical diagnosis. In medical diagnoses, Bayes' theorem is utilized to
analyze data from previous cases and estimate the likelihood of a patient
having a given disease. Bayesian can consider various factors that could
influence the result and generate probabilistic estimates that are more
explainable than just binary results.
3. Image processing. Bayesian inference has multiple applications in image
processing. For instance, it can be used to remove noise from images by
applying techniques such as the Markov Chain Monte Carlo and Bayes
theorem.
4. Speech processing. Bayesian nonparametric clustering (BNC) is used in the
nonparametric hierarchical neural network to perform speech and emotion
recognition. This process outperforms other state-of-the-art models on
similar tasks.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy