0% found this document useful (0 votes)
3 views18 pages

Section 5

The document discusses Maximum Likelihood Estimation (MLE), a statistical method introduced by Sir Ronald Fisher, which estimates parameters of a probability distribution by maximizing the likelihood function. It provides analytical examples for Bernoulli, Poisson, and Exponential distributions, demonstrating how to derive the maximum likelihood estimators for each case. MLE is characterized as a parametric approach to infer population parameters based on sample data, contrasting with non-parametric methods that do not assume a specific distribution.

Uploaded by

sunnytseng201701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views18 pages

Section 5

The document discusses Maximum Likelihood Estimation (MLE), a statistical method introduced by Sir Ronald Fisher, which estimates parameters of a probability distribution by maximizing the likelihood function. It provides analytical examples for Bernoulli, Poisson, and Exponential distributions, demonstrating how to derive the maximum likelihood estimators for each case. MLE is characterized as a parametric approach to infer population parameters based on sample data, contrasting with non-parametric methods that do not assume a specific distribution.

Uploaded by

sunnytseng201701
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Statistics 2

Section 5 Maximum Likelihood Estimation

Week 14, Summer 2025

1. Definition

The idea of Maximum Likelihood Estimation (MLE) is given by Sir Ronald Fisher

(1890-1962). A likelihood function is identical to the probability density function (or

pmf) of a random variable 𝑋, but viewed from a different perspective:

𝑝𝑑𝑓 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑥 ↦ 𝑓(𝑥|𝜽)


{
𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝜽 ↦ 𝑓(𝑥|𝜽)

, where 𝑥 is a possible value of 𝑋 and 𝜽 is the set of parameters. For example, in

case of normal distribution 𝜽 = {𝜇, 𝜎}. When we say random variable 𝑋 has a pdf

𝑓(𝑥|𝜽), this is a function of 𝑥 and the values of 𝜽 are fixed or given. Now if we look

at 𝑓(𝑥|𝜽) from a different perspective – as a function of parameters 𝜽 , with the

observed value 𝑥 given; then, in this perspective, we have a likelihood function and

we shall use the notation 𝑙(𝜽|𝑥) to emphasize this perspective (function of 𝜽 and

given the observed value 𝑥).

For example, when we have a random variable 𝑋 ~ 𝑁(𝜇, 𝜎 2 ), we can set 𝜇 =

0.80 and 𝜎 2 = 0.0016; then we can simulate many values {𝑥1 , 𝑥2 … , 𝑥𝑛 } of 𝑋 from

this distribution. Below are three samples with 2500 observations; the sample means
1
and sample standard deviations are different, but they are from the same distribution:

250

200

150

100

50

0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

sample mean = 0.8009


sample standard deviation = 0.0400
250

200

150

100

50

0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

sample mean = 0.8006


sample standard deviation = 0.0396
250

200

150

100

50

0
0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

sample mean = 0.7995


sample standard deviation = 0.0395
2
Therefore, when a random variable 𝑋 ~ 𝑁(𝜇, 𝜎 2 ), the parameter values are fixed. In

this example, the parameters are given by 𝜇 = 0.80 and 𝜎 2 = 0.0016 . The pdf

𝑓(𝑥|𝜽) is the red curve, and it is a function of 𝑥, which is what we have learned in

Section 1 and Section 21. Recall that on p.9 of Section 2, we have 𝑓(𝑥1 ) = 𝑥1 + 1/2

and 𝑓(𝑥2 ) = 1/2 + 𝑥2 , and on p.13 we have 𝑓(𝑥1 ) = 2(1 − 𝑥1 ) and 𝑓(𝑥2 ) = 2𝑥2

in the exercise questions.

On the other hand, a likelihood function 𝑙(𝜽|𝑥) is a function of parameters 𝜽,

with the observed value 𝑥 fixed. We can understand this concept as follows: for the

data we have at hand, like TSMC returns over past 10 years, we can make an assumption

about the distribution of data; for example, we may assume TSMC returns are normally

distributed with parameters 𝜇 and 𝜎 2 , or have a 𝑡 distribution with parameter 𝜈 .

Then, for the given data, we can find the set of parameters that lead to the highest

likelihood for seeing the data at hand. This approach is sensible since the data

{𝑥1 , 𝑥2 … , 𝑥𝑛 }, of course, is already realized and therefore the parameter values which

maximize likelihood function are good estimates of true parameters. Such estimators

̂ 𝑀𝐿𝐸 .
are called Maximum Likelihood Estimators, denoted by 𝜽

As a result, we say that MLE is a parametric approach to make inference on the

population using a sample, because we will need to assume a distribution. Other ways

1
Note that in the figure, the y-axis is the number of observations for the histogram.
3
of making inference without making assumption on the distribution of data are thus

non-parametric methods.

2. Some Analytical Examples

Bernoulli Distribution

Let 𝑋1 , 𝑋2 , …, 𝑋𝑛 be a random sample which can be assumed to have Bernoulli

distribution with pmf:

𝑝(𝑥) = 𝑝 𝑥 (1 − 𝑝)1−𝑥 (1)

, for 𝑥 = 0, 1 . The parameter 0 ≤ 𝑝 ≤ 1 is the probability of success. A random

sample means each 𝑋1, 𝑋2, …, 𝑋𝑛 are independent. Thus, the probability that we see

𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , …, 𝑋𝑛 = 𝑥𝑛 is the joint probability:

𝑃(𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 ) = 𝑃(𝑋1 = 𝑥1 ) ∙ 𝑃(𝑋2 = 𝑥2 ) ∙ … ∙ 𝑃(𝑋𝑛 = 𝑥𝑛 )

= 𝑝 𝑥1 (1 − 𝑝)1−𝑥1 𝑝 𝑥2 (1 − 𝑝)1−𝑥2 ∙ … ∙ 𝑝 𝑥𝑛 (1 − 𝑝)1−𝑥𝑛 (2)

Again, like the small 𝑥 in Section 1 and Section 2, here {𝑥1 , 𝑥2 , … , 𝑥𝑛 } is a set of

realized values of 0 and 1. But equation (2) can be summarized as:

𝑝∑ 𝑥𝑖 (1 − 𝑝)𝑛−∑ 𝑥𝑖 (3)

Now when we view (3) as a function of parameter 𝑝 , given the set of data

{𝑥1 , 𝑥2 , … , 𝑥𝑛 }, we have the likelihood function:

4
𝑙(𝑝) = 𝑝∑ 𝑥𝑖 (1 − 𝑝)𝑛−∑ 𝑥𝑖 , for 0 ≤ 𝑝 ≤ 1 (4)

The maximum likelihood estimate of 𝑝 is the value that maximizes (4). However, this

maximization is usually done with the log of likelihood function, because after taking

log the result is much easier to deal with:

𝑛 𝑛
𝐿(𝑝) = (∑ 𝑥𝑖 ) ln 𝑝 + (𝑛 − ∑ 𝑥𝑖 ) ln(1 − 𝑝) (5)
𝑖=1 𝑖=1

In addition, taking log does not affect the optimization task2. Thus, to find the maximum

value of 𝐿(𝑝), we apply the First Order Condition (FOC):

𝜕𝐿(𝑝) ∑𝑛𝑖=1 𝑥𝑖 𝑛 − ∑𝑛𝑖=1 𝑥𝑖


= + (− )=0 (6)
𝜕𝑝 𝑝 1−𝑝

Thus,

∑𝑛𝑖=1 𝑥𝑖 𝑛 − ∑𝑛𝑖=1 𝑥𝑖
=
𝑝 1−𝑝

𝑛 𝑛
⟹ (1 − 𝑝) ∑ 𝑥𝑖 = 𝑝 (𝑛 − ∑ 𝑥𝑖 )
𝑖=1 𝑖=1

𝑛
⟹∑ 𝑥𝑖 = 𝑝𝑛
𝑖=1

And the solution to 𝑝 is:

∑𝑛𝑖=1 𝑥𝑖
𝑝= (7)
𝑛

➢ Do you think this result in (7) is sensible or reasonable?

2
This is because log is a monotone function.
5
As a result, the maximum likelihood estimator 𝑝̂𝑀𝐿𝐸 for a random sample which can

be assumed to have Bernoulli distribution is:

1 𝑛
𝑝̂ 𝑀𝐿𝐸 = ∑ 𝑋𝑖 = 𝑋 (8)
𝑛 𝑖=1

Poisson Distribution

Let 𝑋1 , 𝑋2 , …, 𝑋𝑛 be a random sample which can be assumed to have Poisson

distribution with pmf:

𝜆𝑥 𝑒 −𝜆
𝑝(𝑥) = (9)
𝑥!

, for 𝑥 = 0, 1, 2 … and 𝜆 > 0. The joint probability of 𝑋1 = 𝑥1, 𝑋2 = 𝑥2 , …, 𝑋𝑛 =

𝑥𝑛 is:

𝑃(𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 ) = 𝑃(𝑋1 = 𝑥1 ) ∙ 𝑃(𝑋2 = 𝑥2 ) ∙ … ∙ 𝑃(𝑋𝑛 = 𝑥𝑛 )

𝜆𝑥1 𝑒 −𝜆 𝜆𝑥2 𝑒 −𝜆 𝜆𝑥𝑛 𝑒 −𝜆


= ∙ …∙
𝑥1 ! 𝑥2 ! 𝑥𝑛 !

1
= 𝜆∑ 𝑥 𝑒 −𝑛𝜆 (10)
𝑥1 ! 𝑥2 ! … 𝑥𝑛 !

The log-likelihood function is therefore:

𝑛
𝐿(𝜆) = − ln(𝑥1 ! 𝑥2 ! … 𝑥𝑛 !) + (∑ 𝑥𝑖 ) ∙ ln 𝜆 − 𝑛𝜆 (11)
𝑖=1

Applying the FOC and the solution to 𝜆 is:

𝜕𝐿(𝜆) ∑𝑛𝑖=1 𝑥𝑖
= −𝑛 =0
𝜕𝜆 𝜆
6
∑𝑛𝑖=1 𝑥𝑖
⟹𝜆= (12)
𝑛

➢ Do you think this result in (12) is sensible or reasonable?

Again, as a result, the maximum likelihood estimator 𝜆̂𝑀𝐿𝐸 for a random sample which

can be assumed to have Poisson distribution is:

1 𝑛
𝜆̂𝑀𝐿𝐸 = ∑ 𝑋𝑖 = 𝑋 (13)
𝑛 𝑖=1

Exponential Distribution

Let 𝑋1, 𝑋2, …, 𝑋𝑛 be a random sample which can be assumed to have Exponential

distribution with pdf:

𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 , for 𝑥 ≥ 0 (14)

The joint pdf of 𝑋1 = 𝑥1, 𝑋2 = 𝑥2 , …, 𝑋𝑛 = 𝑥𝑛 is given as:

𝑓(𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 ) = 𝑓(𝑋1 = 𝑥1 ) ∙ 𝑓(𝑋2 = 𝑥2 ) ∙ … ∙ 𝑓(𝑋𝑛 = 𝑥𝑛 )

= 𝜆𝑒 −𝜆𝑥1 ∙ 𝜆𝑒 −𝜆𝑥2 ∙ … ∙ 𝜆𝑒 −𝜆𝑥𝑛

= 𝜆𝑛 𝑒 −𝜆 ∑ 𝑥

The log-likelihood function is therefore:

𝑛
𝐿(𝜆) = 𝑛 ln 𝜆 − 𝜆 ∑ 𝑥𝑖 (15)
𝑖=1

The FOC gives us:

7
𝜕𝐿(𝜆) 𝑛 𝑛
= − ∑ 𝑥𝑖 = 0
𝜕𝜆 𝜆 𝑖=1

1 ∑𝑛𝑖=1 𝑥𝑖
⟹ =
𝜆 𝑛

➢ Do you think this result in (12) is sensible or reasonable?

Normal Distribution

Let 𝑋1 , 𝑋2 , …, 𝑋𝑛 be a random sample which can be assumed to have Normal

distribution with pdf:

1 1 𝑥−𝜇 2
𝑓(𝑥) = exp (− ( ) ) (16)
√2𝜋𝜎 2 2 𝜎

, for 𝑥 ∈ ℝ. The joint pdf of 𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , …, 𝑋𝑛 = 𝑥𝑛 is given as:

𝑓(𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 ) = 𝑓(𝑋1 = 𝑥1 ) ∙ 𝑓(𝑋2 = 𝑥2 ) ∙ … ∙ 𝑓(𝑋𝑛 = 𝑥𝑛 )

𝑛 1 𝑛
= (2𝜋𝜎 2 )− 2 exp (− ∑ (𝑥𝑖 − 𝜇)2 ) (17)
2𝜎 2 𝑖=1

The log-likelihood function is therefore:

𝑛 1 𝑛
𝐿(𝜇, 𝜎) = − ln 2𝜋 − 𝑛 ln 𝜎 − 2 ∑ (𝑥𝑖 − 𝜇)2 (18)
2 2𝜎 𝑖=1

The FOC gives two equations:

𝜕𝐿(𝜇, 𝜎) 1 𝑛
= 2 ∑ (𝑥𝑖 − 𝜇) = 0 (19)
𝜕𝜇 𝜎 𝑖=1

𝜕𝐿(𝜇, 𝜎) 𝑛 1 𝑛
= − + 3 ∑ (𝑥𝑖 − 𝜇)2 = 0 (20)
𝜕𝜎 𝜎 𝜎 𝑖=1

From (19), we can obtain:


8
1 𝑛
𝜇̂ 𝑀𝐿𝐸 = ∑ 𝑋𝑖 = 𝑋 (21)
𝑛 𝑖=1

With the solution to 𝜇, we obtain from (20) the solution to 𝜎:

𝑛
2 (𝑥𝑖 − 𝜇)2
𝑛𝜎 = ∑
𝑖=1

1 𝑛 2
⟹ 𝜎̂𝑀𝐿𝐸 = √ ∑ (𝑋𝑖 − 𝑋)
𝑛 𝑖=1

Or, equivalently3,

𝑛
̂2 𝑀𝐿𝐸 = 1 ∑ (𝑋𝑖 − 𝑋)2
𝜎 (22)
𝑛 𝑖=1

➢ Do you think this result in (12) is sensible or reasonable?

3
Let 𝜃̂𝑀𝐿𝐸 be the maximum likelihood estimator of 𝜃 and let 𝑔 be a nice function; then the
maximum likelihood estimator for 𝑔(𝜃) is given by 𝑔(𝜃̂𝑀𝐿𝐸 ).
9
3. The Properties of Maximum Likelihood Estimator

From the above examples, what can we say about the properties of maximum likelihood

̂ 𝑀𝐿𝐸 ? Well, from the above results on Bernoulli, Poisson, Exponential and
estimators 𝜽

Normal distributions, we see that the MLE estimators are all given by the sample mean

̂ 𝑀𝐿𝐸 will have a normal distribution according to Central


𝑋 – this suggests that 𝜽

Limit Theorem. Recall that in the previous subsection, we obtain these solutions to

̂ 𝑀𝐿𝐸 by solving the FOC, but there are many situations where we cannot directly solve
𝜽

̂ 𝑀𝐿𝐸 cannot be found.


the FOC and analytical solutions to 𝜽

When we cannot have an analytical solution to the maximum likelihood estimator

̂ 𝑀𝐿𝐸 , we can still show that 𝜽


𝜽 ̂ 𝑀𝐿𝐸 usually have a normal distribution, provided some

regularity conditions hold. These conditions are, however, too technical (and very much

beyond the scope of this course), so we do not have to state them here.

̂ 𝑀𝐿𝐸 usually have a normal distribution”,


A very important thing is that, since “𝜽

̂ 𝑀𝐿𝐸 is a random variable. Therefore, we can talk about its expectation as well as its
𝜽

variance or standard deviation. These discussions give us three properties of an

estimator – unbiasedness, consistency and efficiency.

̂ 𝑀𝐿𝐸 , we need to consider why the


Before we dive into these properties of 𝜽

̂ 𝑀𝐿𝐸 is a random variable. In fact, we can have many


maximum likelihood estimator 𝜽

̂ 𝑀𝐿𝐸 is just one of them. An estimator is a random variable


different estimators and 𝜽
10
because it is calculated from a random sample. Previously, we are familiar with the

idea that 𝑋 is a random variable which may have pdf 𝑓(𝑥), with 𝜽 the parameter(s).

In Section 4, when we solve the exercises we usually know the value of 𝜽. In real

situations, we are given a sample or data and do not know the true value of 𝜽, and we

try to estimate this value from a sample:

{𝑋1, 𝑋2, …, 𝑋𝑛 }

We often assume that this is a random sample which means that the random variables

𝑋1, 𝑋2, …, 𝑋𝑛 are independent and have the same distribution as 𝑋. In other words,

we say 𝑋1, 𝑋2, …, 𝑋𝑛 are i.i.d. (independent and identically distributed). A statistic

𝑇 is a function of the random sample, that is, 𝑇 = 𝑇(𝑋1 , … , 𝑋𝑛 ); when we use 𝑇 =

𝑇(𝑋1 , … , 𝑋𝑛 ) to estimate 𝜃 then we say 𝑇 is a point estimator of 𝜃. For example,

suppose 𝑋1, 𝑋2, …, 𝑋𝑛 is a random sample from a distribution (need not to be normal)

with mean 𝜇 and variance 𝜎 2 ; then the sample mean 𝑋 is used to estimate 𝜇, and

the sample variance 𝑆 2 is used to estimate 𝜎 2 .

In real applications, we are usually okay with the “identical distribution” in the

i.i.d. assumption. However, the independent part may not hold for {𝑋1, 𝑋2 , …, 𝑋𝑛 },

and therefore we have to account for the correlation in {𝑋1 , 𝑋2 , …, 𝑋𝑛 }. A very

important example is the GARCH model for daily financial returns4.

4
歡迎下學期大家選修 FM323 財務風險管理哦!
11
Now we can briefly talk about the three properties of an estimator. An estimator is

unbiased if its expectation equals the parameter it aims to estimate. For example, the

sample mean 𝑋 from a random sample {𝑋1, 𝑋2 , …, 𝑋𝑛 } is an unbiased estimator of

𝜇 because:

1 𝑛 1 𝑛 1
𝐸[𝑋 ] = 𝐸 [ ∑ 𝑋𝑖 ] = ∑ 𝐸[𝑋𝑖 ] = (𝑛𝜇) = 𝜇 (23)
𝑛 𝑖=1 𝑛 𝑖 𝑛

As another example, consider the sample variance 𝑆 2 which is defined as:

1 𝑛 2
𝑆2 = ∑ (𝑋𝑖 − 𝑋) (24)
𝒏−𝟏 𝑖=1

We can show that:

1 𝑛 2
𝑆2 = ∑ (𝑋𝑖2 − 2𝑋𝑖 𝑋 + 𝑋 )
𝑛 − 1 𝑖=1

1 𝑛 𝑛 𝑛 2
= (∑ 𝑋𝑖2 − 2 ∑ 𝑋𝑖 𝑋 + ∑ 𝑋 )
𝑛−1 𝑖=1 𝑖=1 𝑖=1

1 𝑛 2 2
= (∑ 𝑋𝑖2 − 2𝑛𝑋 + 𝑛𝑋 )
𝑛−1 𝑖=1

1 𝑛 2
= (∑ 𝑋𝑖2 − 𝑛𝑋 ) (25)
𝑛−1 𝑖=1

Thus, using 𝐸[𝑋𝑖2 ] = 𝐸[𝑋 2 ] = 𝜎 2 + 𝜇 2 and Var(𝑋) = 𝜎 2 /𝑛, we have:

1 𝑛 2
𝐸[𝑆 2 ] = (∑ 𝐸[𝑋𝑖2 ] − 𝑛𝐸 [𝑋 ])
𝑛−1 𝑖=1

1 𝜎2
= (𝑛𝜎 + 𝑛𝜇 − 𝑛 ( + 𝜇 2 )) = 𝜎 2
2 2 (26)
𝑛−1 𝑛

12
Thus, we see the sample variance 𝑆 2 in (24) is indeed an unbiased estimator of 𝜎 2 .

However, if we define:

1 𝑛 2
𝑉= ∑ (𝑋𝑖 − 𝑋) (27)
𝑛 𝑖=1

Then 𝑉 is a biased estimator of 𝜎 2 because:

𝑛−1 2
𝐸[𝑉] = ( ) 𝜎 ≠ 𝜎2
𝑛

The use of (𝑛 − 1) in 𝑆 2 is also known as the Bessel Correction.

Next, we say an estimator is consistent if it converges in probability to the target

parameter 𝜃 . “Convergence in probability” means as the sample size 𝑛 → ∞ , the

probability that the distance between the estimator and the target parameter becomes

very small equals 1:

lim 𝑃(|𝑇𝑛 − 𝜃| < 𝜀) = 1


𝑛→∞

Both the sample mean 𝑋 and the sample variance 𝑆 2 are consistent estimators of

their target parameters, that is, for 𝜀 > 0:

lim 𝑃(|𝑋𝑛 − 𝜇| < 𝜀) = 1


𝑛→∞

and

lim 𝑃(|𝑆𝑛2 − 𝜎 2 | < 𝜀) = 1


𝑛→∞

13
The first result about 𝜇 can be proved by using Chebyshev’s Inequality and the Weak

Law of Large Numbers, but we omit it here.

Finally, an estimator is efficient if it is unbiased, and has a variance that is the

smallest, in the sense that it reaches a lower bound called Rao-Cramér lower bound.

The idea is that an estimator is more efficient if it has a smaller variance. We can

illustrate the concepts of unbiasedness and efficiency as follows:

14
̂ 𝑀𝐿𝐸 (1) may or may not be
In conclusion, maximum likelihood estimators 𝜽

unbiased, (2) are consistent and (3) are efficient estimators asymptotically, under the

regularity conditions and if the assumption about the distribution is correct. The word

“asymptotically” stands for “as the sample size 𝑛 → ∞.” Another nice and important

̂ 𝑀𝐿𝐸 usually have a normal distribution, again under the regularity


property is that 𝜽

conditions and if the assumption about the distribution is correct.

15
4. Maximum Likelihood Estimation of Linear Regression

In this sub-section, we will solve the linear regression problem using maximum

likelihood method. Previously in Section 3, we have shown that linear regression can

be estimated using the method of least square; here we will show that for simple linear

̂ 𝑳𝑺 and the maximum likelihood estimators


regression, the least-square estimator 𝜽

̂ 𝑀𝐿𝐸 are the same, that is, the two methods give us the same solutions to the
𝜽

parameters (𝑎, 𝑏) in the model. In addition, with the maximum likelihood method we

can obtain the expected values, as well as the standard errors of (𝑎, 𝑏).

We begin with the same scatter plot in Section 3:

Given this scatter plot, we can agree that 𝑋 and 𝑌 display a linear relationship, but

the relation is not perfect. That is, there will be some residuals when we fit the straight

line to the points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), …, (𝑥𝑛 , 𝑦𝑛 ).

16
In the method of least squares, to determine the straight line we minimize the sum

of square residuals (SSR). In maximum likelihood method, we take a different approach

and consider the properties of these residuals. First, since they are residuals, by

definition their expected value should be zero. This is like what we have mentioned

before – some of the residuals are positive and some are negative, and so on average,

the expected value = 0. Second, there will be large residuals and small residuals; it is

then sensible to assume the residuals follow a normal distribution 𝑁(0, 𝜎 2 ) with 𝜇 =

0 and variance 𝜎 2 .

As a result, we can write down the regression model:

𝑌𝑖 = 𝑎 + 𝑏𝑥𝑖 + 𝑒𝑖

, where the residuals {𝑒1 , 𝑒2 , …, 𝑒𝑛 } are random variables assumed to be i.i.d. with

normal distribution 𝑁(0, 𝜎 2 ). Note that here 𝑌𝑖 is a random variable constructed from

the observed values 𝑥𝑖 and the random variable 𝑒𝑖 .

17
References

Hogg, R. V., J. McKean and A. T. Craig (2012) Introduction to Mathematical Statistics,

7-th Ed., Pearson, Boston MA.

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy