0% found this document useful (0 votes)

10 views13 pages

Module1 Introduction

The document provides an introduction to statistical inference, explaining concepts such as population, sample, random sample, statistical model, parameter, and statistic through examples. It emphasizes the importance of making inferences about a population based on a sample and outlines the assumptions required to build statistical models. Additionally, it discusses different types of distributions and their characteristics relevant to statistical analysis.

Uploaded by

priyamsahoojnvkp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

Module1 Introduction

Uploaded by

priyamsahoojnvkp

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

MTH211A: Theory of Statistics

Module 1: Introduction to Inference

January 30, 2024

Week 0

I. What is statistical inference?

The basic idea of statistical inference is to learn to infer about an underlying truth from the data.
We will understand this using some examples.

Example 1. Suppose a land of 25 acre is split into 10,000 plots of equal area each, and a seed
of type A is planted in each plot on 1st August 2023. After one month 15 plots are surveyed at
random, and the following germination status is observed (see Table 1).

sample plot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Germination status 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1

Table 1: Germination status of 15 sample plots

Here germination status 1 indicates that the seed is germinated, while 0 indicates that the seed
is not germinated. We are interested in the proportion of germination θ.

To answer the above statistical problem we need to build an appropriate model and insert the
observed data in the model.

First note that, as we could not enumerate the entire 25 acre land, we need to infer the proportion
θ using a randomly selected sub-population, called a sample.
How the data is related to the sample?

1
The word ‘sample’ is typically used to mean a part of a totality. Here the totality is called the
population. The population in this example is the collection of all the plots in the 25 acre land.
The fact that some units are randomly picked up from the population induces two characteristics:
(i) The germination status of the sampled units are not known in advance, hence they are random.
The data in Table 1 is the realization of the random sample.
(ii) The sampled units are expected to have same characteristic as that of the population. Thus,
if the population has a high rate of germination (θ is large), then the sample would also have a
high probability of germination.
To answer the above problem, we have to make a few assumptions on the underlying truth. The
set of assumptions together is called the model, or statistical model. In this problem we may
assume that (a1) the soil condition and other environment related factors remain nearly identical
over the entire 25 acre land. Further, we may assume that (a2) the sampled plots are independent
in the sense that the germination status of one plot does not affect that of other plots. The above
two assumptions together build the following model:
iid
Define Xi : Germination status of the i-th sampled plot, i = 1, . . . , 15. Then Xi ∼ Bernoulli(θ)
for some 0 < θ < 1. Due to (ii) θ appears to be the probability that the seed in the i-th sample
plot germinates. Further, due to assumption (a1) the probability is same across plots (identical
distribution). The unknown probability θ is called a parameter of the model.
As we don’t know θ, we need to approximate θ using the sample realizations or data. If the model
iid
Xi ∼ Bernoulli(θ) is true, then the realizations
P of X1 , · · · , X15 provide an idea about θ. For
example, from Table 1 we see that a total of i Xi = 10 plants are germinated out of 15. Thus,
X̄ = 2/3 seems to be a reasonable estimate for the rate of germination. Statistical inference is the
procedure to come up with an appropriate method of estimating the parameter θ or a function
of the parameter g(θ) based on a sample.

Example 2. Suppose we are interested in estimating the gravitational constant g. Usual way
to estimate g is by the pendulum experiment, where g ≈ 4π 2 l/T 2 , where l is the length of a
simple pendulum and T is time period of the pendulum for an oscillation. Suppose length of a
simple pendulum is 75 cm. Due to variation which depends on several factors such as skill of
the experimentor and measurement error, T can not be measured exactly. Instead 10 repeated
measurements are taken which are

Repetitions 1 2 3 4 5 6 7 8 9 10
Measurements (s) 1.822 1.684 1.688 1.758 1.739 1.805 1.809 1.702 1.614 1.764

Table 2: Results of 10 repeated measurements

Set your assumptions to build a model. Identify the parameter(s). Can you express the quantity
of interest g as a function of the parameter(s)? Can you interpret the data in Table 2 as realization
of a sample from some population? How would you apply the data into the model and estimate
g?

2
II. Now we formally define the different ingredients of statistical inference.
(a) Population. In statistical inference we seek information about some numerical characteristic
of a collection of units, called population. A population can be finite or infinite.
Examples. (i) Suppose you are interested in the average grade obtained by students in
MTH211. Then the population consists of all students who took this course in the past
years, this year and will be taking this course in future years. (ii) Suppose you are interested
in the proportion of homeless people who live in Kanpur, then the population consists of every
individual who live in Kanpur. (iii) Suppose you are interested in the lifetime of Samsung
mobiles of a particular model A, then the population consists of all Samsung mobiles of model
A.
(b) Sample. It is often not possible to enumerate the entire population due to several constraints,
for e.g., time constraint, cost, inaccessibility, etc. In such cases, one examines a part of the
population, called sample. The sample must be a good representative of the population. How
does one select a representative sample? One way is to select sample units at random.
(c) Random sample. Suppose we are interested in the proportion of students in IITK whose
weekly study time is between 40 to 60 hours on an average in 2023. If we had enumerated all
students of IITK on each week through out 2023 and prepared a grand list of average study
time, then it would be possible to calculate the proportion exactly. Suppose the histogram
in Figure 1 represent the completely enumerated data. Let the true proportion of interest be
µ, then µ is the relative frequency of the purple portion in Figure 1.

Figure 1: Histogram of average weekly study time (in hours) of students of IITK

Now, suppose we take a random element from this population (i.e., randomly enquire a
student of IITK). Prior to observing the sample, its realization is a random phenomenon to us,
and we expect the features of the population to be present in the sample. Let X denotes the
random variable indicating the average weekly study time of the randomly selected student.
Then the probability P (X ∈ (40, 60)) is expected to be same as µ. Further, this phenomenon
remains unaltered if we replace the numbers 40 and 60 by any arbitrary a, b ∈ R with a < b.
Therefore, the probability distribution of the random sample is expected to be same as the
relative frequency distribution of the population. Suppose we denote the distribution by F .
Then we say that the random sample X is distributed as F .
Now, suppose instead of considering a random sample of size 1, we consider a random sample
of size n (i.e., we randomly select n students and enquire). For each of these samples, we have
an associated random variable. Let us denote those by X1 , . . . , Xn . Then we expect that
each Xi has the same underlying probability distribution (marginally). Thus Xi is identically
distributed as the distribution F , for each i = 1, . . . , n.
Finally, unless otherwise mentioned, there is no reason to believe that the samples are depen-
dent. Thus, we assume that the random samples are mutually independent. Combining the

3
iid
above facts, the term “X1 , . . . , Xn is a random sample from F ” indicates Xi ∼ F . The F
here can be probability distribution of some random variable, for example, normal, binomial,
etc. It depends on the nature of the underlying population. In the above example (see Figure
1), it is natural to consider F to be a mixture of two normal distributions, one is centered
around 20 with a narrow standard deviation, and the other is centered around 55 with a large
standard deviation.
(d) Statistical Model. Practically, it is not feasible to see the relative frequency distribution
of the population phenomenon, and choose F accordingly. In practice, statisticians take
some assumptions on F based on their experience, past record, or pilot survey. These set of
assumptions (including the assumptions on the samples, for e.g., the sample units are iid)
together is called the statistical model. The statistical model must be a reasonable one.
Examples. Throughout 2020-2022, many mathematicians, statisticians, computer scientists,
epidemiologists have tried to model the growth of COVID-19. Although COVID-19 was a
newly discovered disease, the growth modelling was done using past knowledge on pandemic
growth mechanism. See this article for a comprehensive review. Based on the assumed model,
the scientists tried to infer the number of affected person on a future date, or time for eventual
abolition of the disease, etc.
(e) Parameter. Sometimes the form of the underlying probability distribution F is assumed to
be known except for some constants. For instance, in the previous example, a statistician
may suspect the bimodal nature of study time of students and the bell shaped nature of each
of the sub-populations. However, s/he may not know that the location of the modes, the
spreads, and the proportion of students in each population. Then s/he may set F to be the
mixture of two normal distributions, i.e., π ×normal(µ1 , σ12 )+(1−π)×normal(µ2 , σ22 ). Thus,
F is completely specified except the unknown constants, π (proportion of students in first
sub-population), µi (locations of the sub-populations) and σi2 (scales of the sub-populations),
i = 1, 2. These unknown quantities are called parameters.
To answer different questions related to this population, one needs to make inference on these
parameters only. This type of inference is called parametric inference. An other section of
inference problems are solved without assuming a particular form of F . This section is called
non-parametric inference. In MTH211, we will consider parametric inference only.
(f) Statistic. In parametric, as well as non-parametric inference, the ultimate goal is to infer
a population feature, say µ, based on sample realizations. Statisticians often put forward a
summary function of the samples as an estimate for µ. For instance, in germination status
example (see Example 1), X̄ was used to estimate θ, in the study time example the sample
proportion of students having average weekly study time between 40 to 60 hours can be used
to estimate µ. Such functions of observable samples are called sample statistics.

Some notations.

1. The random quantities, including random samples will be indicated by capital letters, X1 , X2 , · · · .
The realizations will be indicated by small letters, x1 , x2 , . . ., for e.g., P (X = x).
2. Boldface will be used to indicate vectors, for e.g., X indicates random vector, x indicates a vector
of realizations.

3. The parameters of a distribution are treated as unknown fixed quantities in frequentist inference,
and will be indicated in Greek letters, for example, µ, σ, etc. Here also, boldface will be used to
indicate parameter vectors, for example, µ.

4
Week 1

III More on statistics and sampling distributions. Let X1 , . . . , Xn be a random sample of

size n from a population F . The collection of all possible values of (X1 , . . . , Xn ) is called the
sample space. As random variables are measurable functions on R, the sample space is a subset
of Rn . Let T (·) be a real (or vector) valued function whose domain includes the sample space of
(X1 , . . . , Xn ). Then the random variable (or vector) Y = T (X1 , . . . , Xn ) is called a statistic.
As (X1 , . . . , Xn ) is random, so is a statistic Y = T (X1 , . . . , Xn ). The probability distribution of
a statistic is called the sampling distribution of the statistic.
Note that, a statistic does NOT involve a parameter. However, the sampling distribution of a
statistic may involve parameters. For
P example, let X1 , . . . , Xn be a random sample of size n from
normal(µ,P σ 2 ) distribution. Then i=1 Xi is a statistic, and its distribution is normal(µ, σ 2 /n).
n
However, i=1 (Xi − µ)/S is not a statistic, as it involves µ.

Support of a random variable

Let X be a discrete random variable. Then the support of X, say SX , is the collection of points x
in R such that P (X = x) > 0, i.e., SX = {x ∈ R : P (X = x) > 0}.
Let X be a continuous random variable with CDF FX . Then the support of X, say S, is the
collection of points x in R such that X has a probability mass at each non-trivial neighborhood of x,
i.e., SX = {x ∈ R : FX (x + h) − FX (x − h) > 0, for all h > 0}.
Thus, if X is a discrete (or, continuous) random variable with pmf (or, pdf) fX and support SX .
Then x ∈ / SX , then fX (x) = 0. (WHY? Is the converse of the above statement true?)

Some important distributions

1. Discrete distributions

(a) Bernoulli. Let X ∼ Bernoulli(p) distribution. Then

P (X = x) = px (1 − p)1−x , SX = {0, 1}, 0 < p < 1.

(b) Binomial. Let X ∼ binomial(n, p) distribution. Then

n x
P (X = x) = p (1 − p)n−x , SX = {0, 1, . . . , n}, 0 < p < 1, n ∈ N.
x

(c) Poisson. Let X ∼ Poisson(λ) distribution. Then

P (X = x) = exp{−λ}λx /x!, SX = {0, 1, . . .}, λ > 0.

(d) Geometric. Let X ∼ geometric(p) distribution. Then

P (X = x) = (1 − p)x−1 p, SX = {1, . . .}, 0 < p < 1.

5
2. Continuous distributions
(a) Uniform. Let X ∼ uniform(α, β) distribution. Then the probability density function of X,
fX (·), is given by
1
fX (x) = , α < x < β, SX = [α, β], α, β ∈ R, β > α.
β−α

(b) Gamma. Let X ∼ Gamma(α, β) distribution. Then the probability density function of X, fX (·),
is given by
β α α−1 −βx
fX (x) = x e , x > 0, SX = [0, ∞), α > 0, β > 0.
Γ(α)

fX (x) = λe−λx , x > 0, SX = [0, ∞), λ > 0.

Clearly, exponential is a special case of Gamma distribution with parameters 1 and λ.

(d) Normal. Let X ∼ normal(µ, σ 2 ) distribution. Then the probability density function of X, fX (·),
is given by

1 1 2
fX (x) = √ exp − 2 (x − µ) , x ∈ SX = R, µ ∈ R, σ > 0.
2πσ 2σ

(e) Beta. Let X ∼ beta(α, β) distribution. Then the probability density function of X, fX (·), is
given by
Γ(α + β) α−1
fX (x) = x (1 − x)β−1 , 0 < x < 1, SX = [0, 1], α > 0, β > 0.
Γ(α)Γ(β)

(f) Cauchy. Let X ∼ Cauchy(µ, σ) distribution. Then the probability density function of X, fX (·),
is given by
" 2 #−1
1 x−µ
fX (x) = 1+ , x ∈ SX = R, µ ∈ R, σ > 0.
πσ σ

iid
(g) Chi-squared
Pn distribution. Let Xi ∼ normal(0, 1) distribution, i = 1, . . . , n. Then T =
2
X
i=1 i follows a Chi-squared distribution with degrees of freedom (d.f.) n, notationally T ∼ χ2n ,
and the pdf of T , fT , is given by
1
fT (t) = xn/2−1 e−x/2 , x > 0, SX = [0, ∞), n ∈ N.
2n/2 Γ(n/2)
Chi-squared distribution with d.f. n is a special case of Gamma distribution with parameters n/2
and 1/2.
(h) F distribution. Let X ∼ χ2n1 , Y ∼ χ2n2 and X and Y are independently distributed, then
F = n2 X/(n1 Y ) follows an F distribution with d.f. n1 and n2 , notationally F ∼ Fn1 ,n2 , and the
pdf of F , fF , is given by
n /2 −(n1 +n2 )/2
Γ((n1 + n2 )/2) n1 1

n1 /2−1 n1
fF (x) = x 1+ x , x > 0, SX = [0, ∞).
Γ(n1 /2)Γ(n2 /2) n2 n2

(i) t distribution.
p Let X ∼ normal(0, 1), Y ∼ χ2n and X and Y are independently distributed. Then
W = X/ Y /n follows a t-distribution with d.f. n, notationally W ∼ tn , and the pdf of W , fW is
given by
−(n+1)/2
x2

Γ((n + 1)/2)
fW (x) = √ 1+ , x ∈ SX = R.
πnΓ(n/2) n

6
Properties (Homework):
1. Find the expectations and variances of each of the above distributions (if exist).
2. (Additive properties) Prove the following statements using moment generating functions, or
characteristic functions.
ind Pk P
(a) Let Xi ∼ binomial(ni , p), for i = 1, . . . , k, then T = i=1 Xi follows binomial( i ni , p).
ind Pn P
(b) Let Xi ∼ Poisson(λi ), for i = 1, . . . , n, then T = i=1 Xi follows Poisson( i λi ).
ind Pn P 2
(c) Let Xi ∼ normal(µi , σi2 ), for i = 1, . . . , n, then T = i=1 Xi follows normal
P
i µi , i σi .
ind Pn P
(d) Let Xi ∼ Gamma(αi , β), for i = 1, . . . , n, then T = i=1 Xi follows Gamma ( i αi , β).
ind Pk
(e) Let Xi ∼ χ2ni , for i = 1, . . . , k, then T = i=1 Xi follows χ2N where N = i ni .
P

3. Let X ∼ normal(µ, σ 2 ) distribution, then T = aX + b ∼ normal(aµ + b, a2 σ 2 ).

4. Let X ∼ Gamma(α, β) distribution, then T = aX ∼ Gamma(α, β/a).

5. Let X ∼ beta(n/2, m/2) distribution, then T = mX/{n(1 − X)} ∼ Fn,m .

6. Let X ∼ uniform(0, 1) distribution, and α > 0 then T = X 1/α ∼ beta(α, 1).
7. Let X ∼ Cauchy(0, 1) distribution, then T = 1/(1 + X 2 ) ∼ beta(0.5, 0.5).
8. Let X ∼ uniform(0, 1) distribution, then T = −2 log X ∼ χ22 .

9. Let X be distributed as some absolutely continuous distribution with cdf GX , then T = GX (X) ∼
uniform(0, 1).

7
3. Multivariate distributions
Suppose Xi is an absolutely continuous (or, discrete) random variable with pdf (or, pmf) fi ,
i = 1, . . . , n, and X1 , . . . , Xn are mutually independent, then the multivariate distribution of the
random vector X = (X1 , . . . , Xn ) has the joint pdf (or, pmf) fX , where

fX (x1 , . . . , xn ) = f1 (x1 ) × · · · × fn (xn ), for each x ∈ Rn .

Thus, if X1 , . . . , Xn is a random sample from some distribution Qwith pdf (or, pmf) fX . Then the joint
n
pdf (or, pmf) of (X1 , . . . , Xn ) evaluated at the point x ∈ Rn is i=1 fX (xi ).
However, if X1 , . . . , Xn are not mutually independent, then the above generalization is NOT pos-
sible. In that case the joint distribution of X1 , . . . , Xn can NOT be expressed in terms marginal
distributions. In order to infer about the joint behavior of X1 , . . . , Xn , one needs to know the joint
distribution.
A k-dimensional random vector X can be characterized by the joint CDF

FX (x) = P (X1 ≤ x1 , · · · , Xk ≤ xk ) ,

or the moment generating function (MGF, if exists) E (exp{X′ t}), or the characteristic function
E (exp{iX′ t}).
A k-dimensional discrete random vector X can also be characterized by its joint pmf

fX (x) = P (X = x) = P (X1 = x1 , · · · , Xk = xk ).

A k-dimensional absolutely continuous random vector X can also be characterized by its joint pdf
fX , where fX satisfies
Z x1 Z xn
∂k
FX (x) = ··· fX (t)dt, and FX (x) = fX (x).
−∞ −∞ ∂x1 · · · ∂xk

Marginal distribution Let X be a k-dimensional random vector with CDF FX , then the marginal
CDF of j-th component of X, Xj is

FXj (x) = FX (∞, · · · , ∞, |{z}

x , ∞, · · · , ∞), x ∈ R.
j-th

It can be shown that FXj is a CDF of some distribution, and the corresponding distribution is called
the marginal distribution of Xj .

Conditional distribution Let (X, Y)′ be a discrete random variable. Then the conditional distri-
bution of X given Y = y where y ∈ SY , is the discrete random variable with pmf

P (X = x, Y = y) f(X,Y) (x, y)
fX|Y=y (x) = P (X = x | Y = y) = = .
P (Y = y) fY (y)

Let (X, Y )′ be an absolutely continuous random variable with joint CDF F(X,Y ) , joint pdf f(X,Y )
and y be such that fY (y) > 0. Then the CDF and pdf of X given Y = y, denoted by FX|Y =y and
fX|Y =y , respectively, are defined as

f(X,Y ) (x, y)
FX|Y =y (x) = lim P (X ≤ x | y − h < Y ≤ y) , and fX|Y =y (x) = .
h↓0 fY (y)

Similarly, if (X, Y)′ is an absolutely continuous random variable and y be such that fY (y) > 0,
then the pdf of X given Y = y is fX|Y=y (x) = f(X,Y) (x, y)/fY (y).

Some multivariate distributions are often of interest to MTH211. One such distribution is multi-
variate normal distribution.

8
Multivariate normal distribution. The random vector X = (X1 , . . . , Xk )′ is distributed as mul-
tivariate normal distribution with parameters µ ∈ Rk and Σ, where Σ is a k × k symmetric and
positive semi-definite matrix, if for any vector a ∈ Rk , Ta = a′ X ∼ normal (a′ µ, a′ Σa). Notationally,
X ∼ Nk (µ, Σ).
Special case: The two-dimensional normal distribution is called bivariate normal distribution, and
it has 5 parameters (µx , µy , σx2 , σy2 , ρ). If X ∼ N2 (µx , µy , σx2 , σy2 , ρ), then the pdf of X, fX , is given by
( " 2 2 #)
1 1 x − µx y − µy (x − µx )(y − µy )
fX (x, y) = exp − + − 2ρ ,
2(1 − ρ2 )
p
2πσx σy 1 − ρ2 σx σy σx σy

x
∈ R2 , µx , µy ∈ R, σx , σy > 0, 0 ≤ ρ ≤ 1.
y

Figure 2: Contour plot of bivariate normal distribution

Properties (Homework):

1. Let X ∼ Nk (µ, Σ). Then the expectation of X is E(X) = µ, variance-covariance matrix is

′
E {X − E(X)} {X − E(X)} = Σ, and the moment generating function is exp{µ′ t + t′ Σt/2}.
iid
2. (Box-Muller
√ transformation) Let Ui ∼√uniform(0, 1), i = 1, 2. Consider the transformations
Z1 = −2 log U1 cos(2πU2 ), and Z2 = −2 log U1 sin(2πU2 ). Then (Z1 , Z2 )′ ∼ N2 (0, I), where
I is the identity matrix.
iid
3. Let Xi ∼ Gamma(αi , β), i = 1, 2. Consider the transformation Z = X1 /(X1 + X2 ). Then
Z ∼ beta(α1 , α2 ).
4. Let (X, Y ) is jointly distributed as N2 (µx , µy , σx2 , σy2 , ρ). Then the marginal distribution of X
is normal(µx , σx2 ). Also, the conditional distribution of Y given X = x is normal(µy + ρσy (x −
µx )/σx , σy2 (1 − ρ2 )).
iid
5. Let Xi ∼ Bernoulli(p), i = 1, . . . , n. Then the conditional distribution of X given X̄n = y is
free of p.
iid
6. Let Xi ∼ Poisson(λ), i = 1, . . . , n. Then the conditional distribution of X given X̄n = y is free
of λ.

9
Figure 3: Contour plots of bivariate normal distribution with different parameters

10
Week 2

Some important statistics and their sampling distribution

1. Sample
Pn mean. Let X1 , . . . , Xn be a random sample from some distribution F . Then X̄n =
n−1 i=1 Xi is called the sample mean.

Properties of sample mean.

(a) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ and finite
variance σ 2 . Then E(X̄n ) = µ and var(X̄n ) = σ 2 /n. [Proof]
(b) Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ) distribution. Then the sampling
distribution of X̄n is normal(µ, σ 2 /n). [Proof]
(c) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ, and finite
variance σ 2 . Then X̄n is the best linear unbiased estimator (BLUE) of µ. [Proof]

2. Sample variance.
Pn Let X1 , . . . ,P
Xn be a random sample from some distribution F . Then
n
Sn2 = n−1 i=1 (Xi − X̄n )2 = n−1 i=1 Xi2 − X̄n2 is called the sample variance.
The positive square root of sample variance is called sample standard deviation, and is denoted
by Sn .

Properties of sample variance.

(a) Let X1 , . . . , Xn be a random sample from some distribution F with expectation µ, and finite
variance σ 2 . Then E nSn2 /(n − 1) = σ 2 . [Proof]
The statistics Sn⋆2 = nSn2 /(n − 1) is sometimes also referred to as the sample variance. We
will term this statistic as unbiased sample variance.
(b) Let X1 , . . . , Xn be a random sample from normal(µ, σ 2 ) distribution. Then the sampling
distribution (SD) of nSn2 is σ 2 χ2n−1 . [Proof]
(c) (Joint distribution of sample mean and variance for normal samples) Let X1 , . . . , Xn
be a random sample from normal(µ, σ 2 ) distribution. Then X̄n and Sn2 are independently
distributed. [Proof]
√
Thus, Tµ = n(X̄n − µ)/Sn⋆ ∼ tn−1 . (Why?)
3. Correlation coefficient. Let (X1 , Y1 ), . . . , (Xn , Yn ) be a bivariate random sample from some
distribution F . Then the sample correlation coefficient
Pn Pn
n−1 i=1 (Xi − X̄n )(Yi − Ȳn ) n−1 i=1 Xi Yi − X̄n Ȳn
rx,y = = ,
SX SY SX SY
where SX and SY are SDs of X and Y , respectively.
The numerator in the expression of rx,y is called sample covariance.
4. Multivariate extensions of sample mean and variance. Let X1 , . . . , Xn be a random
sample from some multivariate distribution F . Then the sample mean X̄n and sample variance
covariance matrix is defined as
n n n
X X ′ X
X̄n = n−1 Sn = n−1 = n−1 Xi X′i − X̄n X̄′n .

Xi , and Xi − X̄n Xi − X̄n
i=1 i=1 i=1

11
Properties of sample mean and variance.
(a) If the samples are k-dimensional then X̄n is a k-vector, and Sn is a k × k positive semi
definite matrix. The j-th diagonal of Sn is the sample variance of the j-th component of
X, and the (i, j)-th component of Sn is the covariance between the i-th and j-th element of
X. (WHY?)
(b) Let X1 , . . . , Xn be a random sample from some multivariate distribution F with expectation
µ and variance covariance matrix Σ having finite components. Then E(X̄n ) = µ, var(X̄n ) =
n−1 Σ and E(nSn /(n − 1)) = Σ. [Proof]
5. Sample moments. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
r-th order raw moment m′r and r-th order central moment mr are defined as
X X r
m′r = n−1 Xir , and mr = n−1 Xi − X̄n , r > 0.
i i

Properties of sample moments.

(a) The sample central moments can be derived from the sample raw moments and vice versa.
(b) Let X1 , . . . , Xn be a random sample from some distribution F with r-th population moment
µ′r = E (X r ) < ∞. Then E (m′r ) = µ′r . [Proof]
6. Order statistics. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
r-th order statistics X(r) is the r-th smallest of X1 , . . . , Xn , r = 1, . . . , n. Therefore, X(1) =
min{X1 , . . . , Xn } and X(n) = max{X1 , . . . , Xn }.

Properties of sample variance.

(a) Let X1 , . . . , Xn be a random sample from some distribution F with pdf fX . Then
( Q
n
n! i=1 fX (xi ) if x1 < x2 < · · · < xn ,
f(X(1) ,··· ,X(n) ) (x) = [Proof]
0 otherwise.

(b) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX . Then the CDF
of X(n) , say Gn , is given by
n n
Gn (t) = P (X(n) ≤ t) = {P (X ≤ t)} = FX (t). [Proof]

(c) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX . Then the CDF
of X(1) , say Hn , is given by
n
Hn (t) = P (X(1) ≤ t) = 1 − {1 − P (X ≤ t)} = 1 − {1 − FX (t)}n .

(d) Let X1 , . . . , Xn be a random sample from some distribution with CDF FX and pdf fX .
Then the pdf of X(r) , say gr , is given by

n!
gr (t) = FX (t)r−1 fX (t){1 − FX (t)}n−r .
(r − 1)!(n − r)!

(e) Let X1 , . . . , Xn be a random sample from some distribution with CDF Fx and pdf fX . Then
the joint pdf of X(r) and X(s) with r < s, say gr,s , is given by

n!

 FX (w)r−1 {FX (t) − FX (w)}s−r−1 fX (w)fX (t){1 − FX (t)}n−s
(r − 1)!(s − r − 1)!(n − s)!


gr,s (w, t) = if w < t,



0 otherwise.

12
7. Sample median. Let X1 , . . . , Xn be a random sample from some distribution F . Then the
sample median, X̃me is given by
(
X((n+1)/2) if n is odd,
X̃me =
{X(n) + X(n+1) }/2 if n is even.

IV Large Sample Results. Two large sample results would be useful in MTH211.

1. (Weak Law of Large Numbers, WLLN) Let X1 , . . . , Xn be a random sample from a

Pn p
population with E(g(X)) = η < ∞, where g : R → R is a function. Then n−1 i=1 g(Xi ) −
→η
as n → ∞, i.e., for any ϵ > 0,
n
!
X
−1
lim P n g(Xi ) − η > ϵ = 0.
n→∞
i=1

2. (Central Limit Theorem, CLT) Let X1 , . . . , Xn be a random sample from some distribution
√ d
with expectation µ and finite variance σ 2 . Then Tnµ,σ = n(X̄n − µ)/σ −
→ normal(0, 1), that
is for any x ∈ R, the CDF of Tnµ,σ , say Gn , satisfies

Gn (x) → Φ(x), as n → ∞,

where Φ is the CDF of normal(0, 1) distribution.

Corollary:
p
→ µ′r .
(a) For any r > 0 such that µ′2r < ∞, m′r −
d
(b) Let X1 , · · · , Xn be a random from unoform(0, θ), θ > 0. Then X(n) −
→ δ{θ} , where δ{θ} is
the degenerate distribution with non-zero probability mass at θ.

I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
100% (1)
I. Ii. Iii. Iv. V.: EBE 2174/EBQ2074 Econometrics Tutorial 2 (ANSWERS) Evan Lau
3 pages
WINSEM2024-25 MAT1011 ETH AP2024254000644 2025-03-21 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000644 2025-03-21 Reference-Material-I
37 pages
Veggi 2023.07.03
No ratings yet
Veggi 2023.07.03
562 pages
Measurement Errors
No ratings yet
Measurement Errors
16 pages
Compiled by Birhan Fetene: Stat 276: Introductory Probability Lecture Notes
100% (1)
Compiled by Birhan Fetene: Stat 276: Introductory Probability Lecture Notes
77 pages
Test of Hypothesis
67% (12)
Test of Hypothesis
85 pages
PTS1 Reader
No ratings yet
PTS1 Reader
130 pages
A P Value Depends On Sampling Intentions
No ratings yet
A P Value Depends On Sampling Intentions
31 pages
Chapter 1 To Chapter 2 Stat 222
No ratings yet
Chapter 1 To Chapter 2 Stat 222
21 pages
Reading 3 - Probability Concepts
No ratings yet
Reading 3 - Probability Concepts
47 pages
Philosophy of Statistics
No ratings yet
Philosophy of Statistics
115 pages
BSDS Third Round Allocation List
No ratings yet
BSDS Third Round Allocation List
3 pages
Hypothesis Testing II
No ratings yet
Hypothesis Testing II
98 pages
AFM3 Answers SelectedHW AllReview 2018 0328
No ratings yet
AFM3 Answers SelectedHW AllReview 2018 0328
37 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Preliminary Concepts On Statistical Inference
100% (1)
Preliminary Concepts On Statistical Inference
39 pages
QNT 561 Week 2 Weekly Learning Assessments - Assignment
No ratings yet
QNT 561 Week 2 Weekly Learning Assessments - Assignment
6 pages
Expect The Unexpected A First Course in Biostatistics - 2nd Edition Research PDF Download
No ratings yet
Expect The Unexpected A First Course in Biostatistics - 2nd Edition Research PDF Download
17 pages
Practice Set - VI: Sub: (MA 231)
No ratings yet
Practice Set - VI: Sub: (MA 231)
2 pages
Sample Questions
No ratings yet
Sample Questions
1 page
Powerpoint 2 (Introduction and Sampling) 2425
No ratings yet
Powerpoint 2 (Introduction and Sampling) 2425
24 pages
Allied 3 - Mathematical Statistics I - Semester III
No ratings yet
Allied 3 - Mathematical Statistics I - Semester III
3 pages
Sample Questions
No ratings yet
Sample Questions
2 pages
Darp Midterm v3
No ratings yet
Darp Midterm v3
4 pages
Notes5.3 TPS6up
No ratings yet
Notes5.3 TPS6up
22 pages
Stat 115 - Chapter 1
No ratings yet
Stat 115 - Chapter 1
156 pages
Probability Trees Worksheet
No ratings yet
Probability Trees Worksheet
5 pages
Updated BTech BTech-MTech Curriculum DSE
No ratings yet
Updated BTech BTech-MTech Curriculum DSE
2 pages
Probability Distributions Sampling Distribution
No ratings yet
Probability Distributions Sampling Distribution
13 pages
BSDS Notes Weeks10 11
No ratings yet
BSDS Notes Weeks10 11
8 pages
Hypothesis Testing Monograph
No ratings yet
Hypothesis Testing Monograph
50 pages
All Complete Report 1.
No ratings yet
All Complete Report 1.
9 pages
Module3 BCS301
No ratings yet
Module3 BCS301
4 pages
Nxfu
No ratings yet
Nxfu
86 pages
Unit 17
No ratings yet
Unit 17
21 pages
Introduction To Exploratory Analysiss
No ratings yet
Introduction To Exploratory Analysiss
61 pages
Robert v. Hogg, Allen T. Craig - Introduction To M
No ratings yet
Robert v. Hogg, Allen T. Craig - Introduction To M
448 pages
Hypothesis Testing Study Material
No ratings yet
Hypothesis Testing Study Material
9 pages
Auto-Encoding Variational Bayes
No ratings yet
Auto-Encoding Variational Bayes
8 pages
Sample Theory and Test of Significance
No ratings yet
Sample Theory and Test of Significance
11 pages
Lecture-5
No ratings yet
Lecture-5
17 pages
DeCaro Guide To Conceptual Side of Inferential Statistics
No ratings yet
DeCaro Guide To Conceptual Side of Inferential Statistics
10 pages
Wsheet 6
No ratings yet
Wsheet 6
7 pages
AAI Imp Questions
No ratings yet
AAI Imp Questions
3 pages
Week 0
No ratings yet
Week 0
4 pages
Lecture Note 1
No ratings yet
Lecture Note 1
9 pages
Discrete Random Variables: He Shuangchi
No ratings yet
Discrete Random Variables: He Shuangchi
47 pages
Assignment 4
No ratings yet
Assignment 4
3 pages
Chapter 7 Study Guide
No ratings yet
Chapter 7 Study Guide
6 pages
20EC3305 - PTRP - Unit I & 2 - Sessional-I Question Bank - 2022-23
No ratings yet
20EC3305 - PTRP - Unit I & 2 - Sessional-I Question Bank - 2022-23
4 pages
Chapter 2 Stat
No ratings yet
Chapter 2 Stat
106 pages
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-02-26 Reference-Material-I
No ratings yet
WINSEM2024-25 MAT1011 ETH AP2024254000674 2025-02-26 Reference-Material-I
4 pages
Unit 2
No ratings yet
Unit 2
17 pages
STATISTICS Module 1
No ratings yet
STATISTICS Module 1
31 pages
MODULE 1 Stat by PSA
No ratings yet
MODULE 1 Stat by PSA
23 pages
IGNOU Stats Inference Chi Square Block 7 PDF
No ratings yet
IGNOU Stats Inference Chi Square Block 7 PDF
22 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
TMP 8 EB
No ratings yet
TMP 8 EB
137 pages
Module 1-Introductory Concepts - FoEd 203
No ratings yet
Module 1-Introductory Concepts - FoEd 203
6 pages
Module 3
No ratings yet
Module 3
5 pages
Performance Task in Statistics and Probability: Ryan Villajuan 11-Teresa of Calcutta
No ratings yet
Performance Task in Statistics and Probability: Ryan Villajuan 11-Teresa of Calcutta
3 pages
Unit-15 IGNOU STATISTICS
No ratings yet
Unit-15 IGNOU STATISTICS
18 pages
Business Decision Making II Sampling Distributions: Dr. Nguyen Ngoc Phan
No ratings yet
Business Decision Making II Sampling Distributions: Dr. Nguyen Ngoc Phan
11 pages
Hypothesis Testing Monograph
No ratings yet
Hypothesis Testing Monograph
50 pages
Lecture Notes 5
No ratings yet
Lecture Notes 5
8 pages
Eco254 Summary (Full) 08024665051
No ratings yet
Eco254 Summary (Full) 08024665051
12 pages
Notes No. 1 Stat and Prob 11 D 2024
No ratings yet
Notes No. 1 Stat and Prob 11 D 2024
4 pages
Math Module 1 Module 2
No ratings yet
Math Module 1 Module 2
7 pages
BDU Biometrics
No ratings yet
BDU Biometrics
122 pages
Chapter 2 - Random Number Generation
No ratings yet
Chapter 2 - Random Number Generation
21 pages
Review Questions On Chapter 4
No ratings yet
Review Questions On Chapter 4
2 pages
Unit3 Inferentialnew
No ratings yet
Unit3 Inferentialnew
36 pages
Module 1 - Statistics For Remedial
50% (2)
Module 1 - Statistics For Remedial
25 pages
Lesson 4
No ratings yet
Lesson 4
22 pages
Z Table
No ratings yet
Z Table
2 pages
psAT 109
No ratings yet
psAT 109
17 pages
Stat906 f24 A2
No ratings yet
Stat906 f24 A2
2 pages
MATH10282: Introduction To Statistics Supplementary Lecture Notes
No ratings yet
MATH10282: Introduction To Statistics Supplementary Lecture Notes
50 pages
Week 6. Chapter 7 Introduction To Inferential Statistics
No ratings yet
Week 6. Chapter 7 Introduction To Inferential Statistics
24 pages
Notes ch3 Sampling Distributions
No ratings yet
Notes ch3 Sampling Distributions
20 pages
Probability - 17.10.18
No ratings yet
Probability - 17.10.18
2 pages
Assignment: "Applied Statistics"
No ratings yet
Assignment: "Applied Statistics"
10 pages
Ma-6453 Probability and Queuing Theory - Anna University Important Questions
No ratings yet
Ma-6453 Probability and Queuing Theory - Anna University Important Questions
15 pages
Introduction To Statistics Walpole, Ronald E 1974 New York, Macmillan
No ratings yet
Introduction To Statistics Walpole, Ronald E 1974 New York, Macmillan
368 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
10 pages
Chapter Four 4. Joint and Marginal Distributions
100% (1)
Chapter Four 4. Joint and Marginal Distributions
12 pages
Spearman Rho
No ratings yet
Spearman Rho
18 pages
Ch.2 Probability
No ratings yet
Ch.2 Probability
26 pages
Data Collection: Sampling
No ratings yet
Data Collection: Sampling
8 pages
Statistics Lecture Notes
No ratings yet
Statistics Lecture Notes
15 pages
Unit 4 Sampling Distributions: Structure No
No ratings yet
Unit 4 Sampling Distributions: Structure No
24 pages
5.29 A Homeowner Plants 6 Bulbs Selected at Random From A Box Containing
No ratings yet
5.29 A Homeowner Plants 6 Bulbs Selected at Random From A Box Containing
7 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
8 pages
Module 6 Lesson 1
No ratings yet
Module 6 Lesson 1
7 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Module1 Introduction

Uploaded by

Module1 Introduction

Uploaded by

MTH211A: Theory of Statistics

Module 1: Introduction to Inference

I. What is statistical inference?

Table 1: Germination status of 15 sample plots

Table 2: Results of 10 repeated measurements

III More on statistics and sampling distributions. Let X1 , . . . , Xn be a random sample of

Support of a random variable

Some important distributions

(a) Bernoulli. Let X ∼ Bernoulli(p) distribution. Then

P (X = x) = px (1 − p)1−x , SX = {0, 1}, 0 < p < 1.

(b) Binomial. Let X ∼ binomial(n, p) distribution. Then

(c) Poisson. Let X ∼ Poisson(λ) distribution. Then

P (X = x) = exp{−λ}λx /x!, SX = {0, 1, . . .}, λ > 0.

(d) Geometric. Let X ∼ geometric(p) distribution. Then

P (X = x) = (1 − p)x−1 p, SX = {1, . . .}, 0 < p < 1.

fX (x) = λe−λx , x > 0, SX = [0, ∞), λ > 0.

Clearly, exponential is a special case of Gamma distribution with parameters 1 and λ.

3. Let X ∼ normal(µ, σ 2 ) distribution, then T = aX + b ∼ normal(aµ + b, a2 σ 2 ).

5. Let X ∼ beta(n/2, m/2) distribution, then T = mX/{n(1 − X)} ∼ Fn,m .

fX (x1 , . . . , xn ) = f1 (x1 ) × · · · × fn (xn ), for each x ∈ Rn .

FXj (x) = FX (∞, · · · , ∞, |{z}

Figure 2: Contour plot of bivariate normal distribution

1. Let X ∼ Nk (µ, Σ). Then the expectation of X is E(X) = µ, variance-covariance matrix is

Some important statistics and their sampling distribution

Properties of sample mean.

Properties of sample variance.

Properties of sample moments.

Properties of sample variance.

1. (Weak Law of Large Numbers, WLLN) Let X1 , . . . , Xn be a random sample from a

where Φ is the CDF of normal(0, 1) distribution.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.