0% found this document useful (0 votes)
10 views13 pages

Module1 Introduction

The document provides an introduction to statistical inference, explaining concepts such as population, sample, random sample, statistical model, parameter, and statistic through examples. It emphasizes the importance of making inferences about a population based on a sample and outlines the assumptions required to build statistical models. Additionally, it discusses different types of distributions and their characteristics relevant to statistical analysis.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views13 pages

Module1 Introduction

The document provides an introduction to statistical inference, explaining concepts such as population, sample, random sample, statistical model, parameter, and statistic through examples. It emphasizes the importance of making inferences about a population based on a sample and outlines the assumptions required to build statistical models. Additionally, it discusses different types of distributions and their characteristics relevant to statistical analysis.

Uploaded by

priyamsahoojnvkp
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MTH211A: Theory of Statistics

Module 1: Introduction to Inference


January 30, 2024

Week 0

I. What is statistical inference?


The basic idea of statistical inference is to learn to infer about an underlying truth from the data.
We will understand this using some examples.

Example 1. Suppose a land of 25 acre is split into 10,000 plots of equal area each, and a seed
of type A is planted in each plot on 1st August 2023. After one month 15 plots are surveyed at
random, and the following germination status is observed (see Table 1).

sample plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Germination status 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1

Table 1: Germination status of 15 sample plots

Here germination status 1 indicates that the seed is germinated, while 0 indicates that the seed
is not germinated. We are interested in the proportion of germination θ.

To answer the above statistical problem we need to build an appropriate model and insert the
observed data in the model.

First note that, as we could not enumerate the entire 25 acre land, we need to infer the proportion
θ using a randomly selected sub-population, called a sample.
How the data is related to the sample?

1
The word ‘sample’ is typically used to mean a part of a totality. Here the totality is called the
population. The population in this example is the collection of all the plots in the 25 acre land.
The fact that some units are randomly picked up from the population induces two characteristics:
(i) The germination status of the sampled units are not known in advance, hence they are random.
The data in Table 1 is the realization of the random sample.
(ii) The sampled units are expected to have same characteristic as that of the population. Thus,
if the population has a high rate of germination (θ is large), then the sample would also have a
high probability of germination.
To answer the above problem, we have to make a few assumptions on the underlying truth. The
set of assumptions together is called the model, or statistical model. In this problem we may
assume that (a1) the soil condition and other environment related factors remain nearly identical
over the entire 25 acre land. Further, we may assume that (a2) the sampled plots are independent
in the sense that the germination status of one plot does not affect that of other plots. The above
two assumptions together build the following model:
iid
Define Xi : Germination status of the i-th sampled plot, i = 1, . . . , 15. Then Xi ∼ Bernoulli(θ)
for some 0 < θ < 1. Due to (ii) θ appears to be the probability that the seed in the i-th sample
plot germinates. Further, due to assumption (a1) the probability is same across plots (identical
distribution). The unknown probability θ is called a parameter of the model.
As we don’t know θ, we need to approximate θ using the sample realizations or data. If the model
iid
Xi ∼ Bernoulli(θ) is true, then the realizations
P of X1 , · · · , X15 provide an idea about θ. For
example, from Table 1 we see that a total of i Xi = 10 plants are germinated out of 15. Thus,
X̄ = 2/3 seems to be a reasonable estimate for the rate of germination. Statistical inference is the
procedure to come up with an appropriate method of estimating the parameter θ or a function
of the parameter g(θ) based on a sample.

Example 2. Suppose we are interested in estimating the gravitational constant g. Usual way
to estimate g is by the pendulum experiment, where g ≈ 4π 2 l/T 2 , where l is the length of a
simple pendulum and T is time period of the pendulum for an oscillation. Suppose length of a
simple pendulum is 75 cm. Due to variation which depends on several factors such as skill of
the experimentor and measurement error, T can not be measured exactly. Instead 10 repeated
measurements are taken which are

Repetitions 1 2 3 4 5 6 7 8 9 10
Measurements (s) 1.822 1.684 1.688 1.758 1.739 1.805 1.809 1.702 1.614 1.764

Table 2: Results of 10 repeated measurements

Set your assumptions to build a model. Identify the parameter(s). Can you express the quantity
of interest g as a function of the parameter(s)? Can you interpret the data in Table 2 as realization
of a sample from some population? How would you apply the data into the model and estimate
g?

2
II. Now we formally define the different ingredients of statistical inference.
(a) Population. In statistical inference we seek information about some numerical characteristic
of a collection of units, called population. A population can be finite or infinite.
Examples. (i) Suppose you are interested in the average grade obtained by students in
MTH211. Then the population consists of all students who took this course in the past
years, this year and will be taking this course in future years. (ii) Suppose you are interested
in the proportion of homeless people who live in Kanpur, then the population consists of every
individual who live in Kanpur. (iii) Suppose you are interested in the lifetime of Samsung
mobiles of a particular model A, then the population consists of all Samsung mobiles of model
A.
(b) Sample. It is often not possible to enumerate the entire population due to several constraints,
for e.g., time constraint, cost, inaccessibility, etc. In such cases, one examines a part of the
population, called sample. The sample must be a good representative of the population. How
does one select a representative sample? One way is to select sample units at random.
(c) Random sample. Suppose we are interested in the proportion of students in IITK whose
weekly study time is between 40 to 60 hours on an average in 2023. If we had enumerated all
students of IITK on each week through out 2023 and prepared a grand list of average study
time, then it would be possible to calculate the proportion exactly. Suppose the histogram
in Figure 1 represent the completely enumerated data. Let the true proportion of interest be
µ, then µ is the relative frequency of the purple portion in Figure 1.

Figure 1: Histogram of average weekly study time (in hours) of students of IITK

Now, suppose we take a random element from this population (i.e., randomly enquire a
student of IITK). Prior to observing the sample, its realization is a random phenomenon to us,
and we expect the features of the population to be present in the sample. Let X denotes the
random variable indicating the average weekly study time of the randomly selected student.
Then the probability P (X ∈ (40, 60)) is expected to be same as µ. Further, this phenomenon
remains unaltered if we replace the numbers 40 and 60 by any arbitrary a, b ∈ R with a < b.
Therefore, the probability distribution of the random sample is expected to be same as the
relative frequency distribution of the population. Suppose we denote the distribution by F .
Then we say that the random sample X is distributed as F .
Now, suppose instead of considering a random sample of size 1, we consider a random sample
of size n (i.e., we randomly select n students and enquire). For each of these samples, we have
an associated random variable. Let us denote those by X1 , . . . , Xn . Then we expect that
each Xi has the same underlying probability distribution (marginally). Thus Xi is identically
distributed as the distribution F , for each i = 1, . . . , n.
Finally, unless otherwise mentioned, there is no reason to believe that the samples are depen-
dent. Thus, we assume that the random samples are mutually independent. Combining the

3
iid
above facts, the term “X1 , . . . , Xn is a random sample from F ” indicates Xi ∼ F . The F
here can be probability distribution of some random variable, for example, normal, binomial,
etc. It depends on the nature of the underlying population. In the above example (see Figure
1), it is natural to consider F to be a mixture of two normal distributions, one is centered
around 20 with a narrow standard deviation, and the other is centered around 55 with a large
standard deviation.
(d) Statistical Model. Practically, it is not feasible to see the relative frequency distribution
of the population phenomenon, and choose F accordingly. In practice, statisticians take
some assumptions on F based on their experience, past record, or pilot survey. These set of
assumptions (including the assumptions on the samples, for e.g., the sample units are iid)
together is called the statistical model. The statistical model must be a reasonable one.
Examples. Throughout 2020-2022, many mathematicians, statisticians, computer scientists,
epidemiologists have tried to model the growth of COVID-19. Although COVID-19 was a
newly discovered disease, the growth modelling was done using past knowledge on pandemic
growth mechanism. See this article for a comprehensive review. Based on the assumed model,
the scientists tried to infer the number of affected person on a future date, or time for eventual
abolition of the disease, etc.
(e) Parameter. Sometimes the form of the underlying probability distribution F is assumed to
be known except for some constants. For instance, in the previous example, a statistician
may suspect the bimodal nature of study time of students and the bell shaped nature of each
of the sub-populations. However, s/he may not know that the location of the modes, the
spreads, and the proportion of students in each population. Then s/he may set F to be the
mixture of two normal distributions, i.e., π ×normal(µ1 , σ12 )+(1−π)×normal(µ2 , σ22 ). Thus,
F is completely specified except the unknown constants, π (proportion of students in first
sub-population), µi (locations of the sub-populations) and σi2 (scales of the sub-populations),
i = 1, 2. These unknown quantities are called parameters.
To answer different questions related to this population, one needs to make inference on these
parameters only. This type of inference is called parametric inference. An other section of
inference problems are solved without assuming a particular form of F . This section is called
non-parametric inference. In MTH211, we will consider parametric inference only.
(f) Statistic. In parametric, as well as non-parametric inference, the ultimate goal is to infer
a population feature, say µ, based on sample realizations. Statisticians often put forward a
summary function of the samples as an estimate for µ. For instance, in germination status
example (see Example 1), X̄ was used to estimate θ, in the study time example the sample
proportion of students having average weekly study time between 40 to 60 hours can be used
to estimate µ. Such functions of observable samples are called sample statistics.

Some notations.

1. The random quantities, including random samples will be indicated by capital letters, X1 , X2 , · · · .
The realizations will be indicated by small letters, x1 , x2 , . . ., for e.g., P (X = x).
2. Boldface will be used to indicate vectors, for e.g., X indicates random vector, x indicates a vector
of realizations.

3. The parameters of a distribution are treated as unknown fixed quantities in frequentist inference,
and will be indicated in Greek letters, for example, µ, σ, etc. Here also, boldface will be used to
indicate parameter vectors, for example, µ.

4
Week 1

III More on statistics and sampling distributions. Let X1 , . . . , Xn be a random sample of


size n from a population F . The collection of all possible values of (X1 , . . . , Xn ) is called the
sample space. As random variables are measurable functions on R, the sample space is a subset
of Rn . Let T (·) be a real (or vector) valued function whose domain includes the sample space of
(X1 , . . . , Xn ). Then the random variable (or vector) Y = T (X1 , . . . , Xn ) is called a statistic.
As (X1 , . . . , Xn ) is random, so is a statistic Y = T (X1 , . . . , Xn ). The probability distribution of
a statistic is called the sampling distribution of the statistic.
Note that, a statistic does NOT involve a parameter. However, the sampling distribution of a
statistic may involve parameters. For
P example, let X1 , . . . , Xn be a random sample of size n from
normal(µ,P σ 2 ) distribution. Then i=1 Xi is a statistic, and its distribution is normal(µ, σ 2 /n).
n
However, i=1 (Xi − µ)/S is not a statistic, as it involves µ.

Support of a random variable


Let X be a discrete random variable. Then the support of X, say SX , is the collection of points x
in R such that P (X = x) > 0, i.e., SX = {x ∈ R : P (X = x) > 0}.
Let X be a continuous random variable with CDF FX . Then the support of X, say S, is the
collection of points x in R such that X has a probability mass at each non-trivial neighborhood of x,
i.e., SX = {x ∈ R : FX (x + h) − FX (x − h) > 0, for all h > 0}.
Thus, if X is a discrete (or, continuous) random variable with pmf (or, pdf) fX and support SX .
Then x ∈ / SX , then fX (x) = 0. (WHY? Is the converse of the above statement true?)

Some important distributions


1. Discrete distributions

(a) Bernoulli. Let X ∼ Bernoulli(p) distribution. Then

P (X = x) = px (1 − p)1−x , SX = {0, 1}, 0 < p < 1.

(b) Binomial. Let X ∼ binomial(n, p) distribution. Then


 
n x
P (X = x) = p (1 − p)n−x , SX = {0, 1, . . . , n}, 0 < p < 1, n ∈ N.
x

(c) Poisson. Let X ∼ Poisson(λ) distribution. Then

P (X = x) = exp{−λ}λx /x!, SX = {0, 1, . . .}, λ > 0.

(d) Geometric. Let X ∼ geometric(p) distribution. Then

P (X = x) = (1 − p)x−1 p, SX = {1, . . .}, 0 < p < 1.

5
2. Continuous distributions
(a) Uniform. Let X ∼ uniform(α, β) distribution. Then the probability density function of X,
fX (·), is given by
1
fX (x) = , α < x < β, SX = [α, β], α, β ∈ R, β > α.
β−α

(b) Gamma. Let X ∼ Gamma(α, β) distribution. Then the probability density function of X, fX (·),
is given by
β α α−1 −βx
fX (x) = x e , x > 0, SX = [0, ∞), α > 0, β > 0.
Γ(α)

(c) Exponential. Let X ∼ exponential(λ) distribution. Then the probability density function of
X, fX (·), is given by

fX (x) = λe−λx , x > 0, SX = [0, ∞), λ > 0.

Clearly, exponential is a special case of Gamma distribution with parameters 1 and λ.


(d) Normal. Let X ∼ normal(µ, σ 2 ) distribution. Then the probability density function of X, fX (·),
is given by
 
1 1 2
fX (x) = √ exp − 2 (x − µ) , x ∈ SX = R, µ ∈ R, σ > 0.
2πσ 2σ

(e) Beta. Let X ∼ beta(α, β) distribution. Then the probability density function of X, fX (·), is
given by
Γ(α + β) α−1
fX (x) = x (1 − x)β−1 , 0 < x < 1, SX = [0, 1], α > 0, β > 0.
Γ(α)Γ(β)

(f) Cauchy. Let X ∼ Cauchy(µ, σ) distribution. Then the probability density function of X, fX (·),
is given by
"  2 #−1
1 x−µ
fX (x) = 1+ , x ∈ SX = R, µ ∈ R, σ > 0.
πσ σ

iid
(g) Chi-squared
Pn distribution. Let Xi ∼ normal(0, 1) distribution, i = 1, . . . , n. Then T =
2
X
i=1 i follows a Chi-squared distribution with degrees of freedom (d.f.) n, notationally T ∼ χ2n ,
and the pdf of T , fT , is given by
1
fT (t) = xn/2−1 e−x/2 , x > 0, SX = [0, ∞), n ∈ N.
2n/2 Γ(n/2)
Chi-squared distribution with d.f. n is a special case of Gamma distribution with parameters n/2
and 1/2.
(h) F distribution. Let X ∼ χ2n1 , Y ∼ χ2n2 and X and Y are independently distributed, then
F = n2 X/(n1 Y ) follows an F distribution with d.f. n1 and n2 , notationally F ∼ Fn1 ,n2 , and the
pdf of F , fF , is given by
 n /2 −(n1 +n2 )/2
Γ((n1 + n2 )/2) n1 1

n1 /2−1 n1
fF (x) = x 1+ x , x > 0, SX = [0, ∞).
Γ(n1 /2)Γ(n2 /2) n2 n2

(i) t distribution.
p Let X ∼ normal(0, 1), Y ∼ χ2n and X and Y are independently distributed. Then
W = X/ Y /n follows a t-distribution with d.f. n, notationally W ∼ tn , and the pdf of W , fW is
given by
−(n+1)/2
x2

Γ((n + 1)/2)
fW (x) = √ 1+ , x ∈ SX = R.
πnΓ(n/2) n

6
Properties (Homework):
1. Find the expectations and variances of each of the above distributions (if exist).
2. (Additive properties) Prove the following statements using moment generating functions, or
characteristic functions.
ind Pk P
(a) Let Xi ∼ binomial(ni , p), for i = 1, . . . , k, then T = i=1 Xi follows binomial( i ni , p).
ind Pn P
(b) Let Xi ∼ Poisson(λi ), for i = 1, . . . , n, then T = i=1 Xi follows Poisson( i λi ).
ind Pn P 2
(c) Let Xi ∼ normal(µi , σi2 ), for i = 1, . . . , n, then T = i=1 Xi follows normal
P
i µi , i σi .
ind Pn P
(d) Let Xi ∼ Gamma(αi , β), for i = 1, . . . , n, then T = i=1 Xi follows Gamma ( i αi , β).
ind Pk
(e) Let Xi ∼ χ2ni , for i = 1, . . . , k, then T = i=1 Xi follows χ2N where N = i ni .
P

3. Let X ∼ normal(µ, σ 2 ) distribution, then T = aX + b ∼ normal(aµ + b, a2 σ 2 ).


4. Let X ∼ Gamma(α, β) distribution, then T = aX ∼ Gamma(α, β/a).

5. Let X ∼ beta(n/2, m/2) distribution, then T = mX/{n(1 − X)} ∼ Fn,m .


6. Let X ∼ uniform(0, 1) distribution, and α > 0 then T = X 1/α ∼ beta(α, 1).
7. Let X ∼ Cauchy(0, 1) distribution, then T = 1/(1 + X 2 ) ∼ beta(0.5, 0.5).
8. Let X ∼ uniform(0, 1) distribution, then T = −2 log X ∼ χ22 .

9. Let X be distributed as some absolutely continuous distribution with cdf GX , then T = GX (X) ∼
uniform(0, 1).

7
3. Multivariate distributions
Suppose Xi is an absolutely continuous (or, discrete) random variable with pdf (or, pmf) fi ,
i = 1, . . . , n, and X1 , . . . , Xn are mutually independent, then the multivariate distribution of the
random vector X = (X1 , . . . , Xn ) has the joint pdf (or, pmf) fX , where

fX (x1 , . . . , xn ) = f1 (x1 ) × · · · × fn (xn ), for each x ∈ Rn .

Thus, if X1 , . . . , Xn is a random sample from some distribution Qwith pdf (or, pmf) fX . Then the joint
n
pdf (or, pmf) of (X1 , . . . , Xn ) evaluated at the point x ∈ Rn is i=1 fX (xi ).
However, if X1 , . . . , Xn are not mutually independent, then the above generalization is NOT pos-
sible. In that case the joint distribution of X1 , . . . , Xn can NOT be expressed in terms marginal
distributions. In order to infer about the joint behavior of X1 , . . . , Xn , one needs to know the joint
distribution.
A k-dimensional random vector X can be characterized by the joint CDF

FX (x) = P (X1 ≤ x1 , · · · , Xk ≤ xk ) ,

or the moment generating function (MGF, if exists) E (exp{X′ t}), or the characteristic function
E (exp{iX′ t}).
A k-dimensional discrete random vector X can also be characterized by its joint pmf

fX (x) = P (X = x) = P (X1 = x1 , · · · , Xk = xk ).

A k-dimensional absolutely continuous random vector X can also be characterized by its joint pdf
fX , where fX satisfies
Z x1 Z xn
∂k
FX (x) = ··· fX (t)dt, and FX (x) = fX (x).
−∞ −∞ ∂x1 · · · ∂xk

Marginal distribution Let X be a k-dimensional random vector with CDF FX , then the marginal
CDF of j-th component of X, Xj is

FXj (x) = FX (∞, · · · , ∞, |{z}


x , ∞, · · · , ∞), x ∈ R.
j-th

It can be shown that FXj is a CDF of some distribution, and the corresponding distribution is called
the marginal distribution of Xj .

Conditional distribution Let (X, Y)′ be a discrete random variable. Then the conditional distri-
bution of X given Y = y where y ∈ SY , is the discrete random variable with pmf

P (X = x, Y = y) f(X,Y) (x, y)
fX|Y=y (x) = P (X = x | Y = y) = = .
P (Y = y) fY (y)

Let (X, Y )′ be an absolutely continuous random variable with joint CDF F(X,Y ) , joint pdf f(X,Y )
and y be such that fY (y) > 0. Then the CDF and pdf of X given Y = y, denoted by FX|Y =y and
fX|Y =y , respectively, are defined as

f(X,Y ) (x, y)
FX|Y =y (x) = lim P (X ≤ x | y − h < Y ≤ y) , and fX|Y =y (x) = .
h↓0 fY (y)

Similarly, if (X, Y)′ is an absolutely continuous random variable and y be such that fY (y) > 0,
then the pdf of X given Y = y is fX|Y=y (x) = f(X,Y) (x, y)/fY (y).

Some multivariate distributions are often of interest to MTH211. One such distribution is multi-
variate normal distribution.

8
Multivariate normal distribution. The random vector X = (X1 , . . . , Xk )′ is distributed as mul-
tivariate normal distribution with parameters µ ∈ Rk and Σ, where Σ is a k × k symmetric and
positive semi-definite matrix, if for any vector a ∈ Rk , Ta = a′ X ∼ normal (a′ µ, a′ Σa). Notationally,
X ∼ Nk (µ, Σ).
Special case: The two-dimensional normal distribution is called bivariate normal distribution, and
it has 5 parameters (µx , µy , σx2 , σy2 , ρ). If X ∼ N2 (µx , µy , σx2 , σy2 , ρ), then the pdf of X, fX , is given by
( " 2  2 #)
1 1 x − µx y − µy (x − µx )(y − µy )
fX (x, y) = exp − + − 2ρ ,
2(1 − ρ2 )
p
2πσx σy 1 − ρ2 σx σy σx σy
 
x
∈ R2 , µx , µy ∈ R, σx , σy > 0, 0 ≤ ρ ≤ 1.
y

Figure 2: Contour plot of bivariate normal distribution

Properties (Homework):

1. Let X ∼ Nk (µ, Σ). Then the expectation of X is E(X) = µ, variance-covariance matrix is


′
E {X − E(X)} {X − E(X)} = Σ, and the moment generating function is exp{µ′ t + t′ Σt/2}.
iid
2. (Box-Muller
√ transformation) Let Ui ∼√uniform(0, 1), i = 1, 2. Consider the transformations
Z1 = −2 log U1 cos(2πU2 ), and Z2 = −2 log U1 sin(2πU2 ). Then (Z1 , Z2 )′ ∼ N2 (0, I), where
I is the identity matrix.
iid
3. Let Xi ∼ Gamma(αi , β), i = 1, 2. Consider the transformation Z = X1 /(X1 + X2 ). Then
Z ∼ beta(α1 , α2 ).
4. Let (X, Y ) is jointly distributed as N2 (µx , µy , σx2 , σy2 , ρ). Then the marginal distribution of X
is normal(µx , σx2 ). Also, the conditional distribution of Y given X = x is normal(µy + ρσy (x −
µx )/σx , σy2 (1 − ρ2 )).
iid
5. Let Xi ∼ Bernoulli(p), i = 1, . . . , n. Then the conditional distribution of X given X̄n = y is
free of p.
iid
6. Let Xi ∼ Poisson(λ), i = 1, . . . , n. Then the conditional distribution of X given X̄n = y is free
of λ.

9
Figure 3: Contour plots of bivariate normal distribution with different parameters

10
Week 2

Some important statistics and their sampling distribution


1. Sample
Pn mean. Let X1 , . . . , Xn be a random sample from some distribution F . Then X̄n =
n−1 i=1 Xi is called the sample mean.

Properties of sample mean.

(a) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ and finite
variance σ 2 . Then E(X̄n ) = µ and var(X̄n ) = σ 2 /n. [Proof]
(b) Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ) distribution. Then the sampling
distribution of X̄n is normal(µ, σ 2 /n). [Proof]
(c) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ, and finite
variance σ 2 . Then X̄n is the best linear unbiased estimator (BLUE) of µ. [Proof]

2. Sample variance.
Pn Let X1 , . . . ,P
Xn be a random sample from some distribution F . Then
n
Sn2 = n−1 i=1 (Xi − X̄n )2 = n−1 i=1 Xi2 − X̄n2 is called the sample variance.
The positive square root of sample variance is called sample standard deviation, and is denoted
by Sn .

Properties of sample variance.


(a) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ, and finite
variance σ 2 . Then E nSn2 /(n − 1) = σ 2 . [Proof]
The statistics Sn⋆2 = nSn2 /(n − 1) is sometimes also referred to as the sample variance. We
will term this statistic as unbiased sample variance.
(b) Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ) distribution. Then the sampling
distribution (SD) of nSn2 is σ 2 χ2n−1 . [Proof]
(c) (Joint distribution of sample mean and variance for normal samples) Let X1 , . . . , Xn
be a random sample from normal(µ, σ 2 ) distribution. Then X̄n and Sn2 are independently
distributed. [Proof]

Thus, Tµ = n(X̄n − µ)/Sn⋆ ∼ tn−1 . (Why?)
3. Correlation coefficient. Let (X1 , Y1 ), . . . , (Xn , Yn ) be a bivariate random sample from some
distribution F . Then the sample correlation coefficient
Pn Pn
n−1 i=1 (Xi − X̄n )(Yi − Ȳn ) n−1 i=1 Xi Yi − X̄n Ȳn
rx,y = = ,
SX SY SX SY
where SX and SY are SDs of X and Y , respectively.
The numerator in the expression of rx,y is called sample covariance.
4. Multivariate extensions of sample mean and variance. Let X1 , . . . , Xn be a random
sample from some multivariate distribution F . Then the sample mean X̄n and sample variance
covariance matrix is defined as
n n n
X X ′ X
X̄n = n−1 Sn = n−1 = n−1 Xi X′i − X̄n X̄′n .

Xi , and Xi − X̄n Xi − X̄n
i=1 i=1 i=1

11
Properties of sample mean and variance.
(a) If the samples are k-dimensional then X̄n is a k-vector, and Sn is a k × k positive semi
definite matrix. The j-th diagonal of Sn is the sample variance of the j-th component of
X, and the (i, j)-th component of Sn is the covariance between the i-th and j-th element of
X. (WHY?)
(b) Let X1 , . . . , Xn be a random sample from some multivariate distribution F with expectation
µ and variance covariance matrix Σ having finite components. Then E(X̄n ) = µ, var(X̄n ) =
n−1 Σ and E(nSn /(n − 1)) = Σ. [Proof]
5. Sample moments. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
r-th order raw moment m′r and r-th order central moment mr are defined as
X X r
m′r = n−1 Xir , and mr = n−1 Xi − X̄n , r > 0.
i i

Properties of sample moments.


(a) The sample central moments can be derived from the sample raw moments and vice versa.
(b) Let X1 , . . . , Xn be a random sample from some distribution F with r-th population moment
µ′r = E (X r ) < ∞. Then E (m′r ) = µ′r . [Proof]
6. Order statistics. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
r-th order statistics X(r) is the r-th smallest of X1 , . . . , Xn , r = 1, . . . , n. Therefore, X(1) =
min{X1 , . . . , Xn } and X(n) = max{X1 , . . . , Xn }.

Properties of sample variance.


(a) Let X1 , . . . , Xn be a random sample from some distribution F with pdf fX . Then
( Q
n
n! i=1 fX (xi ) if x1 < x2 < · · · < xn ,
f(X(1) ,··· ,X(n) ) (x) = [Proof]
0 otherwise.

(b) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX . Then the CDF
of X(n) , say Gn , is given by
n n
Gn (t) = P (X(n) ≤ t) = {P (X ≤ t)} = FX (t). [Proof]

(c) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX . Then the CDF
of X(1) , say Hn , is given by
n
Hn (t) = P (X(1) ≤ t) = 1 − {1 − P (X ≤ t)} = 1 − {1 − FX (t)}n .

(d) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX and pdf fX .
Then the pdf of X(r) , say gr , is given by

n!
gr (t) = FX (t)r−1 fX (t){1 − FX (t)}n−r .
(r − 1)!(n − r)!

(e) Let X1 , . . . , Xn be a random sample from some distribution with CDF Fx and pdf fX . Then
the joint pdf of X(r) and X(s) with r < s, say gr,s , is given by

n!

 FX (w)r−1 {FX (t) − FX (w)}s−r−1 fX (w)fX (t){1 − FX (t)}n−s
(r − 1)!(s − r − 1)!(n − s)!


gr,s (w, t) = if w < t,



0 otherwise.

12
7. Sample median. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
sample median, X̃me is given by
(
X((n+1)/2) if n is odd,
X̃me =
{X(n) + X(n+1) }/2 if n is even.

IV Large Sample Results. Two large sample results would be useful in MTH211.

1. (Weak Law of Large Numbers, WLLN) Let X1 , . . . , Xn be a random sample from a


Pn p
population with E(g(X)) = η < ∞, where g : R → R is a function. Then n−1 i=1 g(Xi ) −
→η
as n → ∞, i.e., for any ϵ > 0,
n
!
X
−1
lim P n g(Xi ) − η > ϵ = 0.
n→∞
i=1

2. (Central Limit Theorem, CLT) Let X1 , . . . , Xn be a random sample from some distribution
√ d
with expectation µ and finite variance σ 2 . Then Tnµ,σ = n(X̄n − µ)/σ −
→ normal(0, 1), that
is for any x ∈ R, the CDF of Tnµ,σ , say Gn , satisfies

Gn (x) → Φ(x), as n → ∞,

where Φ is the CDF of normal(0, 1) distribution.

Corollary:
p
→ µ′r .
(a) For any r > 0 such that µ′2r < ∞, m′r −
d
(b) Let X1 , · · · , Xn be a random from unoform(0, θ), θ > 0. Then X(n) −
→ δ{θ} , where δ{θ} is
the degenerate distribution with non-zero probability mass at θ.

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy