0% found this document useful (0 votes)

43 views39 pages

G.C. Calafiore (Politecnico Di Torino)

This document summarizes key concepts from a lecture on Bayesian inference: - Bayes' theorem provides a formula for updating probabilities based on new evidence or data. The posterior probability is proportional to the likelihood of the data times the prior probability. - Bayesian estimators are point estimates derived from the central tendencies (mean, median, mode) of the posterior distribution. - Common loss functions used in Bayesian estimation include quadratic, absolute value, and Huber losses, which measure the error between the estimated and true values. - The minimum mean square error estimator minimizes the expected quadratic loss of the posterior distribution for a given dataset.

Uploaded by

Daniel Andres Montoya Espinosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views39 pages

G.C. Calafiore (Politecnico Di Torino)

Uploaded by

Daniel Andres Montoya Espinosa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

G.C.

Calafiore (Politecnico di Torino) 1 / 39

LECTURE 2

Bayesian Inference

Posterior ∝ Likelihood × Prior

T. Bayes

G.C. Calafiore (Politecnico di Torino) 2 / 39

Outline

1 Bayes’ rule

2 Bayesian estimators
Example: estimating a Bernoulli parameter

3 Naive Bayes classifiers

Example: document classification
Example: buyers’ classification

G.C. Calafiore (Politecnico di Torino) 3 / 39

Introduction
Bayesian inference is a mathematical procedure that applies probabilities to
statistical problems. It provides the tools to update one’s beliefs in the evidence
of new data.
Bayes’ Theorem
p(A)p(B|A)
p(A|B) =
p(B)

A is some statement (e.g., “the subject is pregnant”), and B is some data or

evidence (e.g., “the human chorionic gonadotropin (HCG) test is positive”).
We cannot observe A directly, but we observe some related “clue” B:

p(pregnant)p(HCG+|pregnant)
p(pregnant|HCG+) =
p(HCG+)

p(pregnant|HCG+): the probability that the subject is pregnant, given the

information that the HCG test is positive.
Notice that the HCG test is not infallible!

G.C. Calafiore (Politecnico di Torino) 4 / 39

Introduction
p(A)p(B|A) p(pregnant)p(HCG+|pregnant)
p(A|B) = → p(pregant|HCG+) =
p(B) p(HCG+)

p(pregnant): the probability of being pregnant, before looking at any

evidence: it is the a-priori plausibility of statement A, for instance based on
age, nationality, etc.
p(HCG+|pregant) (likelihood) expresses how likely is for the subject to be
pregnant, under the assumption that the HCG test is positive.
p(HCG+) is the total plausibility of the evidence, whereby we have to
consider all possible scenarios to ensure that the posterior is a proper
probability distribution:
p(HCG+) =
p(HCG+|pregnant)p(pregnant) + p(HCG+|not pregnant)p(not pregnant)

Effectively updates our initial beliefs about a proposition with some

observation, yielding a final measure of the plausibility, given the evidence.
G.C. Calafiore (Politecnico di Torino) 5 / 39
Bayes’ Theorem
In pictures...

p(A ∩ B) is the joint probability of A and B, often also denoted by p(A, B)

The conditional probability of A given B is defined as
. p(A ∩ B)
p(A|B) =
p(B)

Since p(A ∩ B) = p(B ∩ A), we also have that

. p(B ∩ A) p(A|B)p(B)
p(B|A) = =
p(A) p(A)
which is Bayes’ rule.
G.C. Calafiore (Politecnico di Torino) 6 / 39
Law of total probability
Let {Bi : i = 1, 2, 3, . . .} be a finite or countably infinite partition of a
sample space (i.e., a set of pairwise disjoint events whose union is the entire
sample space), and each event Bi is measurable.
Then for any event A of the same probability space it holds that
X X
p(A) = p(A ∩ Bi )= p(A|Bi )p(Bi )
i i

We can hence write Bayes’ rule as

p(A|Bk )p(Bk ) p(A|Bk )p(Bk )

p(Bk |A) = =P .
p(A) i p(A|Bi )p(Bi )

G.C. Calafiore (Politecnico di Torino) 7 / 39

Bayes’ rule for probability density functions (pdf)
Let θ ∈ Rn be a vector of parameters we are interested in.
We describe our assumptions and a-priori knowledge about θ, before
observing any data, in the form of a prior probability distribution (or, prior
belief) p(θ).
Let D denote a set of observed data related to θ. The statistical model
relating D to θ is expressed by the conditional distribution p(D|θ), which is
called the likelihood.
Bayes’ rule for pdf then states that
p(D|θ)p(θ)
p(θ|D) = ,
p(D)
where p(θ|D) is the posterior distribution, representing the updated state of
knowledge about θ, after we see the data.
Since p(D) only acts as a normalization constant, we can state Bayes’ rule
as
Posterior ∝ Likelihood × Prior.

G.C. Calafiore (Politecnico di Torino) 8 / 39

Bayes’ rule for probability density functions (pdf)
The denominator p(D) can be expressed in terms of the prior and the
likelihood, via the continuous version of the total probability rule
Z
p(D) = p(D|θ)p(θ)dθ.

The posterior distribution of θ can be used to infer information about θ, i.e.,

to find a suitable point estimate θ̂ of θ (more on this in the following slides).
The Bayesian approach is very suitable for on-line, recursive inference, since
it can be applied recursively: as new data is gathered the current posterior
becomes the new prior, and a new posterior is computed based on the new
data, and so on...

G.C. Calafiore (Politecnico di Torino) 9 / 39

Bayesian Estimators
In statistics, point estimation involves the use of sample data to calculate a
single value (known as a statistic) which is to serve as a best guess or “best
estimate” of an unknown population parameter θ.
Bayesian point-estimators are the central-tendency statistics of the posterior
distribution (e.g., its mean, median, or mode):
I The posterior mean, which minimizes the (posterior) risk (expected
loss) for a squared-error loss function;
I The posterior median, which minimizes the posterior risk for the
absolute-value loss function;
I The maximum a posteriori (MAP) estimate, which finds a maximum of
the posterior distribution;
I The Maximum Likelihood (ML) estimate, which coincides with the
MAP under uniform prior probability.

G.C. Calafiore (Politecnico di Torino) 10 / 39

Loss functions
For a given point estimate θ̂, a loss function L measures the error in
predicting θ via θ̂.
Typical loss functions are the following ones (assuming for simplicity θ to be
scalar)
I Quadratic:
L(θ − θ̂) = (θ − θ̂)2
I Absolute-value:
L(θ − θ̂) = |θ − θ̂|
I Hit-or-miss:
0 if |θ − θ̂| ≤ δ
L(θ − θ̂) =
1 if |θ − θ̂| > δ
I Huber loss:
1
− θ̂)2

2 (θ if |θ − θ̂| ≤ δ
L(θ − θ̂) =
δ(|θ − θ̂| − δ/2) if |θ − θ̂| > δ

G.C. Calafiore (Politecnico di Torino) 11 / 39

Estimation
Bayesian estimators are defined by minimizing the expected loss, under the
posterior conditional density:
Z
θ̂ = arg min L(θ − θ̂)p(θ|D)dθ
θ̂

= arg min E{L(θ − θ̂)|D}.

θ̂

We next discuss three relevant special cases.

G.C. Calafiore (Politecnico di Torino) 12 / 39

Estimation
Minimum Mean-Square Error (MMSE)

For the quadratic loss L(θ − θ̂) = (θ − θ̂)2 , we have

Z
E{L(θ − θ̂)|D} = (θ − θ̂)2 p(θ|D)dθ

To compute the minimum w.r.t. θ̂, we compute the derivative of the

objective
Z Z
∂ ∂
(θ − θ̂)2 p(θ|D)dθ = (θ − θ̂)2 p(θ|D) dθ
∂ θ̂ Z ∂ θ̂
= −2(θ − θ̂)p(θ|D)dθ

Setting the previous derivative to zero, we obtain

Z Z Z
−2(θ − θ̂)p(θ|D)dθ = 0 ⇔ θ̂p(θ|D)dθ = θp(θ|D)dθ
Z
⇔ θ̂ = θp(θ|D)dθ = E{θ|D}

G.C. Calafiore (Politecnico di Torino) 13 / 39

Estimation
Minimum Mean-Square Error (MMSE)

Thus, we have the minimum:

Z
θ̂MMSE = θp(θ|D)dθ = E{θ|D},

i.e., the mean of the posterior pdf p(θ|D).

This is called the minimum mean square error estimator (MMSE estimator),
because it minimizes the average squared error.

G.C. Calafiore (Politecnico di Torino) 14 / 39

Estimation
Minimum Absolute Error (MAE)

For the absolute value loss L(θ − θ̂) = |θ − θ̂|, we have

Z
E{L(θ − θ̂)|D} = |θ − θ̂|p(θ|D)dθ

It can be shown that the minimum w.r.t. θ̂ is obtained when

Z θ̂ Z ∞
p(θ|D)dθ = p(θ|D)dθ
−∞ θ̂

In words, the estimate θ̂ is the value which divides the probability mass into
equal proportions:
Z θ̂
1
p(θ|D)dθ = ,
−∞ 2
which is the definition of the median of the posterior pdf.
A well known fact, see, e.g., J.B.S. Haldane, “Note on the median of a
multivariate distribution,” Biometrika, vol. 35, pp. 414–417, 1948.
G.C. Calafiore (Politecnico di Torino) 15 / 39
Estimation
Maximum A-Posteriori Estimator (MAP)

0 if |θ − θ̂| ≤ δ
For the hit-or-miss loss , we have
1 if |θ − θ̂| > δ
Z θ̂−δ Z ∞
E{L(θ − θ̂)|D} = p(θ|D)dθ + p(θ|D)dθ
−∞ θ̂+δ
Z θ̂+δ
= 1− p(θ|D)dθ
θ̂−δ

Z θ̂+δ
This is minimized by maximizing p(θ|D)dθ.
θ̂−δ

For small δ and suitable assumptions on p(θ|D) (e.g., smooth,

quasi-concave) the maximum occurs at the maximum of p(θ|D).
Therefore, the estimator is the mode (the peak value) of the posteriori PDF.
Thus the name Maximum a Posteriori (MAP) estimator.

G.C. Calafiore (Politecnico di Torino) 16 / 39

Estimation
MAP and Maximum Likelihood (ML) estimators

The MAP estimate is

θ̂MAP = arg max p(θ|D)

θ
[by Bayes’ rule] = arg max p(D|θ)p(θ),
θ

where p(D|θ) is the likelihood, and p(θ) is the prior.

If the prior is uniform, i.e., p(θ) = const., then maximizing p(θ|D) is
equivalent to maximizing the likelihood p(D|θ).
The maximum likelihood estimator is

θ̂ML = arg max p(D|θ),

and it is equivalent to the MAP estimator, under uniform prior.

G.C. Calafiore (Politecnico di Torino) 17 / 39

Estimation
Summary

Probabilistic model:
I p(θ): the prior
I p(D|θ): the likelihood
I p(θ|D) = const. × p(D|θ)p(θ): the posterior.
Estimators:
I θ̂MMSE is the mean of the posterior p(θ|D). It is the value which
minimizes the expected quadratic loss.
I θ̂MAE is the median of the posterior p(θ|D). It is the value which
minimizes the expected absolute loss.
I θ̂MAP is the maximum (i.e., peak, or mode) of the posterior p(θ|D). It
is (approximately) the value which minimizes the expected hit-or-miss
loss.
I θ̂ML is the maximum (i.e., peak, or mode) of the likelihood function
p(D|θ).

G.C. Calafiore (Politecnico di Torino) 18 / 39

Example
Estimating the outcome probability of a Bernoulli experiment

A Bernoulli experiment is an experiment whose outcome y is random and

binary, say 0 (fail) or 1 (success).
The random outcome of a Bernoulli experiment is dictated by an underlying
probability θ ∈ [0, 1], which is the success probability of the experiment, i.e.,
p(y = 1) = θ.
We here assume that θ is unknown, and we want to estimate it from
observed data.
Let D = {y1 , . . . , yN } be the observed outcomes of N such experiments.
Here yi , i = 1, . . . , N, are i.i.d. trials, each having distribution yi ∼ Ber(θ),
where Ber(x|θ) is the Bernoulli distribution

θ for x = 1
Ber(x|θ) = θ1(x) (1 − θ)1(1−x) =
1 − θ for x = 0

where 1(x) is equal to one for x = 1 and it is zero for x = 0.

G.C. Calafiore (Politecnico di Torino) 19 / 39

Example
Estimating the outcome probability of a Bernoulli experiment

Since the experiments are i.i.d., we have that

p(y1 , . . . , yN |θ) = θ1(y1 )+···+1(yN ) (1 − θ)1(1−y1 )+···+1(1−yN )

.
Let N1 = 1(y1 ) + · · · + 1(yN ) denote the number of successes, then

p(y1 , . . . , yN |θ) = θN1 (1 − θ)N−N1 ,

which is proportional to the Binomial pdf

. N k
Bin(k|θ, N) = θ (1 − θ)N−k
k

Thus, the likelihood of the data is

p(D|θ) = p(y1 , . . . , yN |θ) = θN1 (1 − θ)N−N1 ∝ Bin(N1 |θ, N)

G.C. Calafiore (Politecnico di Torino) 20 / 39

Example
Estimating the outcome probability of a Bernoulli experiment
Maximum likelihood estimation
The ML estimate is obtained by maximizing the likelihood p(D|θ) w.r.t. θ.
Taking the derivative
d d N1
p(D|θ) = θ (1 − θ)N−N1
dθ dθ
= N1 θN1 −1 (1 − θ)N−N1 − (N − N1 )θN1 (1 − θ)N−N1 −1
= θN1 −1 (1 − θ)N−N1 −1 (N1 (1 − θ) − (N − N1 )θ)
= θN1 −1 (1 − θ)N−N1 −1 (N1 − Nθ)

we see that the derivative is zero for

N1
θ̂ML = .
N

A quite expected result: the estimated success probability is the empirical

frequency of successes observed in N trials.

G.C. Calafiore (Politecnico di Torino) 21 / 39

Example
Maximum likelihood estimation
Overfitting in ML estimation, or the zero-count problem
A problem with the ML estimation is that it puts too much emphasis on the
observed data.
Suppose for example that the Bernoulli process that we are observing is a
(perhaps unfair, or biased) coin, where 1 indicates heads, and 0 indicates tail.
Suppose we perform a few experiments, say N = 5, and in all the N
experiments we observe heads. We would conclude that θ̂ML = N1 /N = 1,
so our guess is that the underlying Bernoulli probability is one. This means
that we will predict another head, with probability one.
ML estimation may lead to poor results for small sample sizes. ML works
fine asymptotically, i.e., for N → ∞.

G.C. Calafiore (Politecnico di Torino) 22 / 39

Example
MAP estimation

If we have the information that the experiment is actually a coin toss, we can
actually consider a prior on θ which is somewhat concentrated around 1/2,
instead of uniform in [0, 1], as we implicitly assumed in the ML estimation.
A typical choice of prior for the Bernoulli likelihood is the Beta prior

p(θ) = Beta(θ|a, b) ∝ θa−1 (1 − θ)b−1 ,

where a, b > 0 are the hyperparameters of the distribution. If a = b = 1 we

obtain the uniform distribution.
For Beta(θ|a, b) we have

a a−1 ab
mean = , mode = , variance =
a+b a+b−2 (a + b)2 (a + b + 1)

For instance, for a = b = 2, Beta(θ|a, b) has mode and mean in 1/2.

G.C. Calafiore (Politecnico di Torino) 23 / 39

Example
MAP estimation

An interesting fact is that for a Bernoulli likelihood

p(D|θ) = p(y1 , . . . , yN |θ) = θN1 (1 − θ)N−N1 ∝ Bin(N1 |θ, N)

assuming a Beta(θ|a, b) prior yields to a posterior which is again a Beta

distribution, since

p(θ|D) ∝ p(D|θ)p(θ) ∝ θN1 (1 − θ)N−N1 θa−1 (1 − θ)b−1

= θN1 +a−1 (1 − θ)N−N1 +b−1 .

When the prior and the posterior have the same form, we say that the prior
is a conjugate prior for the corresponding likelihood. In the case of the
Bernoulli likelihood, the conjugate prior is the beta distribution.

G.C. Calafiore (Politecnico di Torino) 24 / 39

Example
MAP estimation

Since

p(θ|D) ∝ θN1 +a−1 (1 − θ)N−N1 +b−1 = Beta(θ|N1 + a, N − N1 + b),

we have that θ̂MAP is given by the mode of the posterior, hence

N1 + a − 1
θ̂MAP = .
a+b+N −2

N1 +1
In our example with a = b = 2, we obtain θ̂MAP = N+2 .

For the small-sample example with N = 5 and N1 = N, wee see that the
effect of an informative prior is to avoid the extreme estimate
θ̂ML = N1 /N = 1.

G.C. Calafiore (Politecnico di Torino) 25 / 39

Example
MMSE estimation

The MMSE estimate is given by the mean of the posterior distribution, i.e.,
N1 + a
θ̂MMSE = E{θ|D} = .
N +a+b

We may observe that

a N1
θ̂MMSE = α + (1 − α) ,
a+b N
for α = (a + b)/(N + a + b).
The interpretation is that the posterior mean is a convex combination of the
prior mean and of the ML estimate.

G.C. Calafiore (Politecnico di Torino) 26 / 39

Example
Medical diagnosis

Suppose you are a woman in your 40s, and you decide to have a medical test
for breast cancer called a mammogram.
If the test is positive, what is the probability you have cancer?
That obviously depends on how reliable the test is. Suppose you are told the
test has a sensitivity of 80%, which means, if you have cancer, the test will
be positive with probability 0.8. In other words,
p(y = 1|x = 1) = 0.8
where y = 1 is the event the outcome of the mammogram is positive, and
x = 1 is the (hidden) event you have breast cancer.
Consider also the rate of “false positives” of the test, quantified as
p(y = 1|x = 0) = 0.1.
What is the probability that you have cancer (x = 1), given that the
mammogram is positive (y = 1)? That is, evaluate
p(x = 1|y = 1).

G.C. Calafiore (Politecnico di Torino) 27 / 39

Example
Medical diagnosis

Many people conclude they are therefore 80% likely to have cancer. But this
is false!
It ignores the prior probability of having breast cancer, which fortunately is
quite low:
p(x = 1) = 0.004.
Ignoring this prior is called the base rate fallacy.
We compute the correct probability by using Bayes’ rule:
p(y = 1|x = 1)p(x = 1)
p(x = 1|y = 1) =
p(y = 1)
0.8 × 0.004
=
p(y = 1|x = 1)p(x = 1) + p(y = 1|x = 0)p(x = 0)
0.8 × 0.004
= = 0.031.
0.8 × 0.004 + 0.1 × (1 − 0.004)
In other words, if you test positive, you only have about a 3% chance of
actually having breast cancer!
G.C. Calafiore (Politecnico di Torino) 28 / 39
Naive Bayes classifiers
We next discuss how to classify vectors of n features x = (x1 , . . . , xn ) into K
classes C1 , . . . , CK . The output y of the model is thus a categorical variable
in {1, . . . , K }.
The classifier assigns to each input feature vector x = (x1 , . . . , xn ) a class
probability
p(y |x), y = 1, . . . , K .

We will use a generative approach. This requires us to specify the class

conditional distribution, p(x|y ).
The simplest approach is to assume the features are conditionally
independent given the class label. This allows us to write the class
conditional density as a product of one dimensional densities:
n
Y
p(x|y = c, θ) = p(xj |y = c, θjc )
j=1

The resulting model is called a naive Bayes classifier (NBC).

G.C. Calafiore (Politecnico di Torino) 29 / 39

Naive Bayes classifiers
In the case of binary features xj ∈ {0, 1}, we can use the Bernoulli
distribution:
p(x|y = c, θ) = Ber(xj |θjc ),
where θjc is the probability that feature j occurs in class c. This is
sometimes called the multivariate Bernoulli naive Bayes model.
In the case of categorical features, xj ∈ {1, . . . , Q}, we can use the
multinoulli distribution
n
Y
p(x|y = c, θ) = Cat(xj |θjc ),
j=1

where θjc is a histogram over the Q possible values for xj in class c. That is,
if xj ∼ Cat(xj |θjc ), then p(xj = q|θjc ) = θjc (q).

G.C. Calafiore (Politecnico di Torino) 30 / 39

Naive Bayes classifiers
The class probability, given the input feature, can be expressed as
n
p(x|y )p(y ) 1 Y
p(y |x) = = p(y ) p(xj |y ),
p(x) p(x)
j=1

where the constant factor (the evidence) p(x) can be computed as

K
X
p(x) = p(Ck )p(x|Ck ).
k=1

The chosen output class is the one who maximizes p(y |x):

ŷ = arg max p(y |x) = arg max p(x|y )p(y ).

y y

G.C. Calafiore (Politecnico di Torino) 31 / 39

Naive Bayes classifiers
Binary features

In the case of binary features xj ∈ {0, 1} we obtain the Bernoulli naive Bayes
classifier, whereby
Yn
xi
p(x|Ck ) = θki (1 − θki )1−xi
i=1

where θki is the probability of class Ck generating the feature xi .

The decoupling of the class conditional feature distributions means that

each distribution can be independently estimated as a one-dimensional
distribution.

G.C. Calafiore (Politecnico di Torino) 32 / 39

Example
Document classification

Consider a dictionary composed of n words.

Any document D can be represented by a binary vector x ∈ {0, 1}n , where
xj = 1 if word j appears in the document, and xj = 0 otherwise.
Any document belongs to one of K classes C1 , . . . , CK .
Given the class Ck , the class-conditional probability of a document x is
n
Y
p(x|Ck ) = p(xi |Ck ),
i=1

and we assume that p(xi |Ck ) is Bernoulli with parameter θki (to be
estimated).
The question that we want to answer is: “what is the probability that a
given document D (or, more precisely, its encoding x) belongs to a given
class Ck ?” In other words, what is p(Ck |x)?

G.C. Calafiore (Politecnico di Torino) 33 / 39

Example
Document classification

By Bayes’ rule
n
P(x|Ck )p(Ck ) p(Ck ) Y
p(Ck |x) = = p(xi |Ck ).
p(x) p(x)
i=1

Suppose there are only two classes C1 and C2 (e.g., spam and non-spam),
then
n n
p(C1 ) Y p(C2 ) Y
p(C1 |x) = p(xi |C1 ), p(C2 |x) = p(xi |C2 ).
p(x) p(x)
i=1 i=1

By dividing we obtain the odds

n
p(C1 |x) p(C1 ) Y p(xi |C1 )
=
p(C2 |x) p(C2 ) p(xi |C2 )
i=1

p(C1 |x)
Classify as spam if p(C2 |x) > 1.

G.C. Calafiore (Politecnico di Torino) 34 / 39

Example
Buyers’ classification

The table above contains N = 14 samples of potential clients.

Each client is described by an n-dimensional vector of attributes, or, features.
G.C. Calafiore (Politecnico di Torino) 35 / 39
Example
Buyers’ classification

There are n = 4 categorical features:

x = (Age, Income, Student, CreditRating)

The response variable y is a binary output that describes wether or not the
client will buy a computer.
We shall use this data to construct a Bayes’ classifier with the purpose of
predicting if a new client x will buy a computer or not.
There are two classes C1 (buys computer) and C2 (does not buy a
computer).
Under the Naive Bayes’ hypotheses, we wish to compute p(C1 |x), p(C2 |x):
n n
p(C1 ) Y p(C2 ) Y
p(C1 |x) = p(xi |C1 ), p(C2 |x) = p(xi |C2 ).
p(x) p(x)
i=1 i=1

Observe that for the purpose of deciding if p(C1 |x) > p(C2 |x) we do not
need to evaluate the common denominator p(x)...
G.C. Calafiore (Politecnico di Torino) 36 / 39
Example
Buyers’ classification
We next evaluate all the terms needed to build the classifier:
The marginal class probabilities p(C1 ), p(C2 ) are simply evaluated as the
empirical frequencies of the two classes in the data set:
number of clients that buy a computer 9
p(C1 ) = = = 0.643
N 14
number of clients that do not buy a computer 5
p(C2 ) = = = 0.357.
N 14

The class-conditional probability p(xi |Ck ) is evaluated as the empirical

frequency of the outcome xi for the population in class Ck .
For instance, if
x =(age=youth, income=medium, student=yes, credit=fair),
then...

G.C. Calafiore (Politecnico di Torino) 37 / 39

Example
Buyers’ classification

...class-conditional probabilities (continued)

Therefore,
n
p(C1 ) Y
p(C1 |x) = p(xi |C1 )
p(x)
i=1
0.643 0.0282
= × 0.222 × 0.444 × 0.667 × 0.667 =
p(x) p(x)

n
p(C2 ) Y
p(C2 |x) = p(xi |C2 )
p(x)
i=1
0.357 0.0069
= × 0.600 × 0.400 × 0.200 × 0.400 = .
p(x) p(x)

Since p(C1 |x) + p(C2 |x) = 1, we obtain p(x) = 0.0351.

Since p(C1 |x) > p(C2 |x), we classify x in class C1 .

G.C. Calafiore (Politecnico di Torino) 39 / 39

IT590 Bayesian Theory Lecture 2
No ratings yet
IT590 Bayesian Theory Lecture 2
5 pages
Debt/Equity Ratio and Expected Common Stock Returns: Empirical Evidence
No ratings yet
Debt/Equity Ratio and Expected Common Stock Returns: Empirical Evidence
22 pages
Bayesian 1 - Exercise Three
No ratings yet
Bayesian 1 - Exercise Three
2 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Stochastic-Scheduling Notes
No ratings yet
Stochastic-Scheduling Notes
9 pages
Probabilistic Programming Julia
No ratings yet
Probabilistic Programming Julia
91 pages
Beta Binomial Distribution
No ratings yet
Beta Binomial Distribution
4 pages
Phani 1987
No ratings yet
Phani 1987
3 pages
Exploring Main Factors Affecting On Impu
No ratings yet
Exploring Main Factors Affecting On Impu
7 pages
Fulltext
No ratings yet
Fulltext
13 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
Fluid Automation - Example of Exam - 1
No ratings yet
Fluid Automation - Example of Exam - 1
1 page
Fluid Automation Example of Exam 3
No ratings yet
Fluid Automation Example of Exam 3
1 page
Fluid Automation Example of Exam 5
No ratings yet
Fluid Automation Example of Exam 5
1 page
Fluid Automation - Example of Exam - 6
No ratings yet
Fluid Automation - Example of Exam - 6
1 page
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
23 pages
350 - 10315 - CB519 - 2019 - 1 - 1 - 1 - PERT and Range Estimate
No ratings yet
350 - 10315 - CB519 - 2019 - 1 - 1 - 1 - PERT and Range Estimate
29 pages
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
No ratings yet
A Better Lemon Squeezer? Maximum-Likelihood Regression With Beta-Distributed Dependent Variables
18 pages
JuliaPro v0.6.4.1 Package API Manual
No ratings yet
JuliaPro v0.6.4.1 Package API Manual
497 pages
Inventory Models For Intermittent Highly Variable Demand and Poli
No ratings yet
Inventory Models For Intermittent Highly Variable Demand and Poli
258 pages
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
f (x) = Γ (α+β) Γ (α) Γ (β) x, for 0<x<1
No ratings yet
f (x) = Γ (α+β) Γ (α) Γ (β) x, for 0<x<1
8 pages
Prueba Diagnostica - Credibilidad&perdida
No ratings yet
Prueba Diagnostica - Credibilidad&perdida
2 pages
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
STA642 Short Notes Written by Mahar Afaq Safdar Muhammadi
No ratings yet
STA642 Short Notes Written by Mahar Afaq Safdar Muhammadi
13 pages
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Learning Outcomes
0% (1)
Learning Outcomes
48 pages
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
GNDT Ii Level Approach For Seismic Vulnerability Assessment of Unreinforced Masonry (Urm) Building Stock Manjip Shakya
No ratings yet
GNDT Ii Level Approach For Seismic Vulnerability Assessment of Unreinforced Masonry (Urm) Building Stock Manjip Shakya
7 pages
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Statistical Inference
No ratings yet
Statistical Inference
47 pages
Berryfunctions PDF
No ratings yet
Berryfunctions PDF
170 pages
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Calculation of Repeatability and Reproducibility For Qualitative Data
No ratings yet
Calculation of Repeatability and Reproducibility For Qualitative Data
12 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Jacobian Convolution
No ratings yet
Jacobian Convolution
23 pages
Project Management Using GERT Analysis
No ratings yet
Project Management Using GERT Analysis
11 pages
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2141)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Homework2Solutions PDF
No ratings yet
Homework2Solutions PDF
5 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Excel Statistics Functions
No ratings yet
Excel Statistics Functions
6 pages
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Polya Note PDF
No ratings yet
Polya Note PDF
7 pages
Vertical Drain Report
No ratings yet
Vertical Drain Report
50 pages
Manual Xlstatpro
No ratings yet
Manual Xlstatpro
230 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.