0% found this document useful (0 votes)
8 views14 pages

Week 2 Lecture 1

This document discusses the desirable properties of point estimators, focusing on parametric and nonparametric techniques, and the assumption of normality in statistical procedures. It outlines the requirements for parametric tests, methods for checking normality, and the implications of normality on hypothesis testing. Additionally, it highlights the limitations of normality tests and emphasizes the importance of using a combination of graphical and statistical methods to assess normality.

Uploaded by

jdmckie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Week 2 Lecture 1

This document discusses the desirable properties of point estimators, focusing on parametric and nonparametric techniques, and the assumption of normality in statistical procedures. It outlines the requirements for parametric tests, methods for checking normality, and the implications of normality on hypothesis testing. Additionally, it highlights the limitations of normality tests and emphasizes the importance of using a combination of graphical and statistical methods to assess normality.

Uploaded by

jdmckie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ECON20003: QM2

WEEK 2: DESIRABLE PROPERTIES OF POINT ESTIMATORS


PARAMETRIC AND NONPARAMETRIC TECHNIQUES THE
ASSUMPTION OF NORMALITY
References:
S: § 10.1
W: 3.7

Notes prepared by:


Dr László Kónya and
Dr Mehmet Özmen

Faculty of Business and Economics


Department of Economics
PARAMETRIC AND NONPARAMETRIC TECHNIQUES
• Many statistical procedures for interval estimation and hypothesis testing
a) are concerned with population parameters, and
b) are based on certain assumptions about the sampled
population or about the sampling distribution of some point
estimator.
These procedures are usually referred to as parametric procedures.

e.g., the confidence interval estimation and hypothesis testing of a


population mean based on the t distribution are parametric
procedures as they are concerned with the population mean and
assume that
i. The sample has been randomly selected.
Otherwise it might not represent the population accurately.

ii. The variable of interest is quantitative …


iii. … and is measured on an interval or a ratio scale. 2
UoM, ECON 20003, Week 2
Otherwise the population mean would not exist and the
central location could be measured only with the mode and
the median (if the measurement scale is at least ordinal).

iv. The population standard deviation is unknown, but the population


is normally distributed, at least approximately.

Nonparametric procedures: Procedures that are either not


concerned with some population parameter or are based on relatively
weaker assumptions than their parametric counterparts, and hence
require less information about the sampled population.
For example:
Parametric Non-parametric
Paired t-test Wilcoxon Rank sum Test
Unpaired t-test Mann-Whitney U test
Pearson correlation Spearman correlation
One way Analysis of Variance (ANOVA) Kruskal Wallis Test

UoM, ECON 20003, Week 2 3


THE ASSUMPTION OF NORMALITY

• A crucial assumption behind most parametric procedures is normality,


namely that the underlying sampling distribution is normally distributed.

e.g., when testing a population mean with a parametric test,


either  should be known and the sample mean should be
normally distributed (Z-test),
or if  is unknown, the sampled population itself should be
normally distributed (t-test), implying that the sample mean is
also normally distributed.

Normality can be verified in a number of ways:

(i) graphs,
(ii) sample statistics and
(iii) formal hypothesis tests.

UoM, ECON 20003, Week 2 4


i. Checking normality visually

We can use two types of graphs to study whether a data set is characterised by
a normal distribution: histogram and QQ-plot.

The QQ (quantile-quantile) plot is a scatter plot that depicts the


cumulative relative frequency distribution of the sample data against
some known cumulative probability distribution.
When it is used for checking normality, the reference distribution is a
(standard) normal distribution and if the sample data is normally
distributed, the points on the scatter plot lie on a straight line.

10
UoM, ECON 20003, Week 2
Ex 1: (Week 1, Ex 2)

• Last week we performed a t-test to find out


whether there was sufficient evidence at the 5%
level of significance to establish that the average
Australian is more than 10kg overweight.
• The sample size was large enough (n = 100) to
rely on CLT, so the sampling distribution of the
sample mean could be assumed approximately
normal.
• However, 𝜎 was unknown, so we had to
assume that the sampled population was not
extremely non-normal in order to be able to rely
on the t-test.
a) Develop a histogram and a QQ-plot of Diff with R to see whether the
sampled population might be normally distributed.

Histogram of diff with a normal curve that has


the same mean and standard deviation than
the sample of diff QQ-plot of diff

The histogram is skewed to the right and on the QQ-plot the points are
scattered around the straight line. Hence, both graphs suggest that diff is
unlikely to be normally distributed.
UoM, ECON 20003, Week 2 7
ii. Quantifying normality with numerical descriptive measures
There are four simple numerical descriptive measures that can help us
decide whether a data set is characterised by a normal distribution: mean,
median, skewness and kurtosis.

For (continuous and unimodal) symmetric distributions, such as the normal:


mean = median
For right (positively) skewed distributions:
median < mean
For left (negatively) skewed distributions:
mean < median.

UoM, ECON 20003, Week 2 8


• Skewness (SK) is a descriptor of the shape of a distribution and it is
concerned with the asymmetry of a distribution around its mean.

The population parameter for SK


is the third standardized moment
defined as:

For symmetric distributions SK = 0, for distributions that are skewed to


the right SK > 0, and for distributions that are skewed to the left SK < 0.

The pastecs package of R


estimates SK with:

The approximate estimated


standard error of this statistic is

The data are likely asymmetric,


thus non-normal (at  = 0.05), when
9
UoM, ECON 20003, Week 2
• Kurtosis (K) is another descriptor of the shape of a (unimodal)
distribution. It is about the tails of a distribution, i.e. about outliers,
relative to the normal distribution.

These curves illustrate three different


allocations of the unit probability of
the certain event over the range of
possible values.

A distribution whose tails are relatively long and thus has more outliers,
is called leptokurtic (leptos is Greek for thin, fine).
A distribution whose tails are relatively short and thus has fewer
outliers, is called platykurtic (platus is Greek for broad, flat).

Note: In terms of parametric vs. nonparametric procedures, the real issue is


whether the distribution is normal or not, the distinction between
leptokurtic and platykurtic is of secondary importance.
UoM, ECON 20003, Week 2 10
The population parameter for K is
the fourth standardized moment
defined as:

K = 3 for normal distributions, K > 3 for leptokurtic distributions, and


K < 3 for platykurtic distributions. K − 3 is called excess kurtosis.

The pastecs package of R


estimates K − 3 with

The approximate estimated


standard error of this statistic is

The data are likely non-normal


(at  = 0.05), when

UoM, ECON 20003, Week 2 11


iii. Testing for normality

There are several statistical tests for normality, i.e. for


H0 : the data comes from a normally distributed population;
HA : the data comes from a non-normally distributed population.

We use only the Shapiro-Wilk (SW) test because it is easy to


implement in R and compares favourably to other tests for normality at
the limited sample sizes we usually have to work with in economics,
business and marketing.

We do not discuss the details of this test as we shall always perform it


with R. The program reports the test statistic and the p-value and H0 is
rejected if the p-value is smaller than the selected significance level.

UoM, ECON 20003, Week 2 12


Some key practical insights:
a) The t test is fairly robust to departures from normality and hence reliable in
practice, unless the sample size is very small (say, less than 30) and/or the
population is strongly non-normal (e.g. skewed).
b) The SW test, similarly to other tests for normality, has two shortcomings.
(i) At small sample sizes (say, n < 20), when the normality assumption can be
crucial, it has little power to reject H0 even if the population is indeed not
normally distributed (Type II error).
(ii) At large sample sizes (say, n > 100), when the violation of normality is far less
critical in practice, it tends to be too sensitive to the slightest signs of non-
normality in the sample and often rejects H0 even if it is actually true (Type I
error).
For these reasons, it is not recommended to rely entirely on the SW test.
It is always better to assess normality with a combination of graphs, sample statistics
and formal hypothesis tests, though at small sample sizes all these checks can be
unreliable.
UoM, ECON 20003, Week 2 13
(Ex 1)
b) Obtain descriptive statistics and the SW statistic for diff with R and discuss their implications
about normality.
The stat.desc function of the pastecs package generates the following printout:

i. The sample mean (12.175) is bigger than the sample median (10.500), so the sample of diff
is skewed to the right (non-normal).

ii. SK-hat = 0.556 is positive, so the sample of diff is skewed to the right. SK-hat divided by
twice of the standard error is skew.2SE = 1.151 > 1, so the distribution of diff is unlikely
normal.

iii. The estimate of excess kurtosis is K-hat – 3 = -0.548. It is negative, so the sample of diff is
platykurtic. However, the absolute value of K-hat – 3 divided by twice of the standard error is
|kurt.2SE | = 0.573 < 1, so the distribution of diff might be normal.

iv. The reported p-value of the SW test is normtest.p = 0.001 < 0.05, thus normality is rejected
at the 5% level.

Since 3 out of 4 checks cast doubt on normality, the t-test in Ex 2, Week 1 might be misleading.
UoM, ECON 20003, Week 2 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy