0% found this document useful (0 votes)

19 views20 pages

PLU Quantitative Techniques 2

Quantitative techniques notes

Uploaded by

trintusdivala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views20 pages

PLU Quantitative Techniques 2

Quantitative techniques notes

Uploaded by

trintusdivala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

ECON-3202: Quantitative Techniques

Lecture Notes 2

Wanangwa Gondwe
Pentecostal Life
University

2 Descriptives Statistics

2.1 Introduction
This section presents an introduction to descriptive statistics. In this unit we will qualita-
tively and quantitatively describe characteristics of data. We will summarize attributes of a
sample with the aim of knowing its nature and present it so that others can understand and
use the information it contains. We will cover ways to numerically locate data centrally and
ascertain the nature of its variability. Most descriptive statistics are presented in what is
known as “summary statistics” because they summarize the characteristics of the data. They
provide visually easy to understand graphics and tables that will promote comprehension
and further inquiry. When we present descriptive statistics collectively either graphically or
in tabular form we do what is called exploratory data analysis.
Descriptive statistics gives several techniques for organizing data. Bar graphs, pie charts,
frequency distributions, histograms, and stem-and-leaf plots are techniques for describing
data. Often times, we are interested in a typical numerical value to help us describe a data
set. This typical value is often called an average value or a measure of central tendency. We
are looking for a single number that is in some sense representative of the complete data set.
In statsticics, this is called data reduction.

2.1.1 Summarizing qualitative data

Frequency distribution

When data are qualitative, we use names to identify the different categories (or classes).
Often we summarize qualitative data by using a frequency distribution. It is a tabular
summary of data showing the number (frequency) of items in each of several non-overlapping

1
classes. A frequency distribution for qualitative data lists all categories and the number of
elements that belong to each of the categories. For example types of regions and the frequency
of interviews for a survey. See Table below.

Table 1: Frequency of interviews reached by region

Region Frequency
North 2176
Central 3952
Southern 5306
*
Source: Malawi Integrated Household survey data, 2019

Relative frequency

A relative frequency distribution gives a tabular summary of data showing the relative fre-
quency for each class. For example, in Table 1 the second column shows the frequencies per
class. If we divide all of them by the total, we would get relative frequencies. A percentage
frequency distribution summarizes the percent frequency of the data for each class. Column
3 of Fig. 3 shows the percent frequency. Note: To get the percent frequency just multiply
by 100 the relative frequency.

Table 2: Relative Frequency of interviews reached by re-

gion

Region Relative Frequency

North 0.1903096
Central 0.3456358
Southern 0.4640546
*
Source: Malawi Integrated Household survey data, 2019

Summary: The relative frequency of a category is obtained by dividing the frequency for a
category by the sum of all the frequencies. If you multiply the relative frequency by 100 you
get the percentages in the third column.

Cumulative distributions

A cumulative frequency distribution gives the total number of values that fall below various
class boundaries of a frequency distribution. Table below shows the frequency distribution
of the contents in milliliters of a sample of 25 one-liter bottles of Orange Squash.

2
Table 3: Frequency distribution table

Range Frequency
970 to less than 990 5
990 to less than 1010 10
1010 to less than 1030 5
1030 to less than 1050 3
1050 to less than 1070 2

From the Table above we can construct the cumulative frequency distribution as below.

Table 4: Relative frequency distribution table

Content less than Cumulative Frequency Cumulative relative frequency Cumulative percent
970 0 0/25=0 0%
990 5 5/25=0.20 20%
1010 5+10=15 15/25=0.60 60%
1030 15+5=20 20/25=0.80 80%
1050 20+3=23 23/25=0.92 92%
1070 23+2=25 25/25=1 100%

From the Table above cumulative relative frequency is obtained by dividing a cumulative
frequency by the total number of observations in the data set. Cumulative percentages are
obtained by multiplying cumulative relative frequencies by 100.

2.1.2 Summarizing quantitative data

Frequency distribution for quantitative data

There are many similarities between frequency distributions for qualitative data and fre-
quency distributions for quantitative data. Terminology for frequency distributions of quan-
titative data is discussed first, and then examples illustrating the construction of frequency
distributions for quantitative data are given. Table below gives a frequency distribution of
the University of Malawi Entrance examination scores

Table 5: Test scores for LUANAR Entrance exams

Test score Frequency

80-94 8
95-109 14
110-124 24

3
Table 5: Test scores for LUANAR Entrance exams (con-
tinued)

Test score Frequency

125-139 16
140-154 13

The frequency distribution given in Table above is composed of five classes. The classes
are: 80-94, 95-109, 110-124, 125-139, and 140-154. Each class has a lower class limit and
an upper class limit. The lower class limits for this distribution are 80, 95, 110, 125, and
140. The upper class limits are 94, 109, 124, 139, and 154. If the lower class limit for the
second class, 95, is added to the upper class limit for the first class, 94, and the sum divided
by 2, the upper boundary for the first class and the lower boundary for the second class are
determined. Table below gives all the boundaries for Table above. If the lower class limit is
added to the upper class limit for any class and the sum divided by 2, the class mark for that
class is obtained. The class mark for a class is the midpoint of the class and is sometimes
called the class midpoint rather than the class mark. The class marks for Table above are
shown in Table below. The difference between the boundaries for any class gives the class
width for a distribution. The class width for the distribution in Table below is 15.
Table 6: Class limit, boundary, width and mark

Class limit Class boundaries Class width Class mark

80-94 79.5-94.5 15 87
95-109 94.5-109.5 15 102
110-124 109.5-124.5 15 117
125-139 124.5-139.5 15 132
140-154 139.5-154.5 15 147

2.1.3 General rules for forming frequency distributions

1. Determine the largest and smallest numbers in the raw data and thus find the range
(the difference between the largest and smallest numbers).
2. Divide the range into a convenient number of class intervals having the same size. If
this is not feasible, use class intervals of different sizes or open class intervals. The
number of class intervals is usually between 5 and 20, depending on the data. Class
intervals are also chosen so that the class marks (or midpoints) coincide with the
actually observed data. This tends to lessen the so-called grouping error involved in
further mathematical analysis. However, the class boundaries should not coincide with
the actually observed data.
3. Determine the number of observations falling into each class interval; that is, find the
class frequencies.

4
Example

The following data set gives the yearly food distribution expenditure in Thousands of MK
for 25 households in TA Chapananga in Chikwawa:

2.3, 1.9, 1.1, 3.2, 2.7, 1.5, 0.7, 2.5,

2.5, 3.1, 2.5, 2.0, 2.7, 1.9, 2.2, 1.2, 1.3,
1.7, 2.9, 3.0, 3.2, 1.7, 2.2, 2.7, 2.0

Construct a frequency distribution consisting of six classes for this data set. Use 0.5 as the
lower limit for the first class and use a class width equal to 0.5.

Solution

The first class would extend from 0.5 to 0.9 since the desired lower limit is 0.5 and the desired
class width is 0.5. Note that the class boundaries are 0.45 and 0.95 and therefore the class
width equals 0.95 - 0.45 or 0.5. The frequency distribution is shown in Table below.
Table 7: Emergency Expenditure in Chikwawa

Expenditure Frequency
0.5 - 0.9 1
1.0 - 1.4 2
1.5-1.9 5
2.0-2.4 5
2.5-2.9 7
3.0-3.4 4

Dot plot

Dot plot is a very simple graph that can be used to summarize a data set is called a dot plot.
To make a dot plot we draw a horizontal axis that spans the range of the measurements in
the data set. We then place dots above the horizontal axis to represent the measurements.
As an example, the figure below shows a dot plot of the exam scores in Statistics 1 first test
of the semester. The horizontal axis spans exam scores from 30 to 100. Each dot above
the axis represents an exam score. For instance, the two dots above the score of 90 tell us
that two students received a 90 on the exam. The dot plot shows us that there are two
concentrations of scores—those in the 80s and 90s and those in the 60s.

32 63 69 85 91 45 64 69 86 92 50 64 72 87 92 56 65 76 87 93
58 66 78 88 93 60 67 81 89 94 61 67 83 90 96 61 68 83 90 98

5
1.00

0.75
count

0.50

0.25

0.00

40 60 80 100
x1

Figure 1: Dot Plot of Scores on Statistics for Economists Test 1

Histogram

A histogram is a graph that shows the distribution of numerical data (it is a bar graph of a
frequency distribution). A histogram is a graph that groups data into different ranges and
then plots it as bars. Figure below shows a histogram of stunting in under five children in
Malawi using DHS data.

2.2 Exploratory data analysis

2.2.1 Stem and leaf diagram

Another simple graph that can be used to quickly summarize a data set is called a stem-and-
leaf display. This kind of graph places the measurements in order from smallest to largest,
and allows the analyst to simultaneously see all of the measurements in the data set and see
the shape of the data set’s distribution. The following is car mileages for cars imported from
Singapore to Malawi:

30.8 30.8 32.1 32.3 32.7 31.7 30.4 31.4 32.7 31.4 30.1 32.5 30.8 31.2 31.8
31.6 30.3 32.8 30.7 31.9 32.1 31.3 31.9 31.7 33.0 33.3 32.1 31.4 31.4 31.5
31.3 32.5 32.4 32.2 31.6 31.0 31.8 31.0 31.5 30.6 32.0 30.5 29.8 31.7 32.3
32.4 30.5 31.1 30.7 31.4

6
2000

1500
Count

1000

500

0 5 10 15 20
Household size

Figure 2: Household size histogram plot

To develop a stem-and-leaf display, we note that the sample mileages range from 29.8 to
33.3 and we place the leading digits of these mileages—the whole numbers 29, 30, 31, 32,
and 33—in a column on the left side of a vertical line. This vertical arrangement of leading
digits forms the stem of the display. Next, we pass through the mileages in Table above one
at a time and place each last digit (the tenths place) to the right of the vertical line in the
row corresponding to its leading digits. We form the leaves of the display by continuing this
procedure as we pass through all 50 mileages. After recording the last digit for each of the
mileages, we sort the digits in each row from smallest to largest and obtain the stem-and-leaf
display that follows:

The decimal point is at the |

29 | 8
30 | 134
30 | 55677888
31 | 00123344444
31 | 55667778899
32 | 011123344
32 | 55778
33 | 03

7
Example

During a study of willingness to buy fish in Lilongwe market, ages of consumers who were
randomly picked were recorded. Their ages were 11, 11, 12, 14, 16, 17, 21, 23, 24, 25, 29, 30,
30, 32, 37, 40, 41, 53, 60. Draw a stem and leaf diagram of the data.

Solution

The decimal point is 1 digit(s) to the right of the |

1 | 112467
2 | 13459
3 | 0027
4 | 01
5 | 3
6 | 0

Advantages

• Easy to construct by hand

• Gives more information than histogram because it has actual data

2.2.2 pie chart

A pie chart is a useful method for displaying the percentage of observations that fall into each
category of qualitative data. A pie chart is an effective method of showing the percentage
breakdown of a whole entity. It is a circular statistical graphic, which is divided into slices to
illustrate numerical proportion. In a pie chart, the arc length of each slice (and consequently
its central angle and area) is proportional to the quantity it represents. While it is named
for its resemblance to a pie which has been sliced, there are variations on the way it can be
presented.

2.2.3 Contingency Tables

Previous sections in this chapter have presented methods for summarizing data for a single
variable. Often, however, we wish to use statistics to study possible relationships between
several variables. In this section we present a simple way to study the relationship between
two variables. Crosstabulation is a process that classifies data on two dimensions. This
process results in a table that is called a contingency table. Such a table consists of rows and
columns—the rows classify the data according to one dimension and the columns classify

8
Figure 3: A histogram showing symmetric data

the data according to a second dimension. Together, the rows and columns represent all
possibilities

Table 8: Contingency Table

Access to Credit No Access to Credit Total

North 575 1601 2176
Central 1074 2878 3952
Southern 1656 3650 5306
Total 3305 8129 11434
*
Source: Malawi IHS data, 2019

Simpsons paradox: Exercise caution in interpreting cross tabulations. In some cases

conclusions based upon aggregated cross tabulations can be completely reversed if we look
at in aggregated data.

2.2.4 Scatter plots

We often study relationships between variables by using graphical methods. A simple graph
that can be used to study the relationship between two variables is called a scatter plot.
As an example, suppose that a marketing manager wishes to investigate the relationship
between the sales volume (in thousands of units) of a product and the amount spent (in
units of MK10,000) on advertising the product. To do this, the marketing manager randomly

9
selects 10 sales districts having equal sales potential. The manager assigns a different level
of advertising expenditure for January 2022 to each sales district as shown in Table below.
At the end of the month, the sales volume for each region is recorded as also shown in Table
below.
Table 9: Values of Advertising Expenditure (in
MK10,000s) and Sales Volume (in 1000s) for Ten Sales
Districts

District Advert Expenditure, x Sales Volume, y

Chitipa 5 80
Rumphi 6 87
Mangochi 7 98
Ntcheu 8 110
Ntchisi 9 103
Dowa 10 114
Mzimba 11 116
Zomba 12 110
Blantyre 13 126
Lilongwe 14 130

To construct a scatter plot, we place the variable advertising expenditure (denoted x) on

the horizontal axis and we place the variable sales volume (denoted y) on the vertical axis.
For the first sales region, advertising expenditure equals 5 and sales volume equals 89. We
plot the point with coordinates x = 5 and y = 89 on the scatter plot to represent this sales
region. Points for the other sales regions are plotted similarly. The scatter plot shows that
there is a positive relationship between advertising expenditure and sales volume that is,
higher values of sales volume are associated with higher levels of advertising expenditure.
We have drawn a straight line through the plotted points of the scatter plot to represent
the relationship between advertising expenditure and sales volume. We often do this when
the relationship between two variables appears to be a straight line, or linear. Of course,
the relationship between x and y in Figure below is not perfectly linear not all of the points
in the scatter plot are exactly on the line. Nevertheless, because the relationship between
x and y appears to be approximately linear, it seems reasonable to represent the general
relationship between these variables using a straight line. In future lectures we will explain
ways to quantify such a relationship that is, describe such a relationship numerically. We
will show that we can statistically express the strength of a linear relationship and that we
can calculate the equation of the line that best fits the points of a scatter plot.

`geom_smooth()` using formula 'y ~ x'

10
140

120
Sales Volume, y

100

5.0 7.5 10.0 12.5

Advertising Expenditure, x

Sometimes a scatter plot is used to demonstrate a negative or inverse relationship between

two variables. In this case, the scatter plot is drawn with a downward slopping curve. A
scatter plot that has dots that do not show any pattern and whose trend line is flat is said
to show no relationship.

2.3 Descriptive measures of data

2.3.1 Measures of central tendency

In previous section we looked at Bar graphs, pie charts, frequency distributions, histograms,
and stem-and-leaf plots. These are techniques for describing data. Often times, we are
interested in a typical numerical value to help us describe a data set. This typical value
is often called an average value or a measure of central tendency. We are looking for a
single number that is in some sense representative of the complete data set. There are many
different measures of central tendency. The three most widely used measures of central
tendency include the mean, median, and mode

a. Mean

Average value of a variable, denoted by µ (pronounced Mu) for the population and for the
sample x̄(x bar).

11
Example:
The number of 911 emergency calls classified as domestic disturbance calls in a Low density
areas of Lilongwe city were sampled for thirty randomly selected 24 hour periods with the
following results.
25 46 34 45 37 36 40 30 29 37 44 56 50 47 23
40 30 27 28 47 58 22 29 56 40 46 38 19 49 50
Find the mean number of calls per 24-hour period.
Solution
P
x 1168
x̄ = = = 38.9
n 30

b. Median

The median of a set of data is a value that divides the bottom 50% of the data from the top
50% of the data represented by . To find the median of a data set, first arrange the data
in increasing order. If the number of observations is odd, the median is the number in the
middle of the ordered list. If the number of observations is even, the median is the mean of
the two values closest to the middle of the ordered
Example:
Given the following data: 32, 42, 46, 54,46. Find the median.
Solution:
Sort the data. 32, 42, 46, 46, 54. The middle number is 46. Therefore, the median is 46.

c. Mode

The mode is the value in a data set that occurs the most often. If no such value exists, we
say that the data set has no mode. If two such values exist, we say the data set is bimodal.
If three such values exist, we say the data set is trimodal. There is no symbol that is used
to represent the mode.
Example: From previous example find mode?
Solution: The most frequently occurring value is 46.

2.3.2 Measures of position

Measures of position are used to describe the location of a particular observation in relation
to the rest of the data set.

12
a. Percentile

Percentiles are values that divide the ranked data set into 100 equal parts. Percentiles provide
information about how the data are spread over the interval from the smallest value to the
largest value. The pth percentile is a value such that at least p percent of the observations
are less than or equal to this value and at least (100 − p) percent of observations are greater
than or equal to this value.
Calculating the pth percentile
The percentile for observation x is found by:

1. Arrange all observations in ascending order.

2. Dividing the number of observations less than x by the total number of observations.

3. Then multiplying this quantity by 100. This percent is then rounded to the nearest
whole number to give the percentile for observation x.

b. Decile

A decile rank arranges the data in order from lowest to highest and is done on a scale of
one to 10 where each successive number corresponds to an increase of 10 percentage points.
This type of data ranking is performed as part of many academic and statistical studies in
the finance and economics field. There is no one way of calculating a decile; however, it is
important that you are consistent with whatever formula you decide to use to calculate a
decile. One simple calculation of a decile is:

1

D1 = × (n + 1) thData
10

2

D2 = × (n + 1) thData
10

3

D3 = × (n + 1) thData
10

5

D5 = × (n + 1) thData
10

9

D9 = × (n + 1) thData
10

13
c. Quartiles

Quartiles divide the data into four parts equal parts.

• 1st quartile = 25th percentile

• 2nd quartile = 50th percentile

• 3rd quartile = 75th percentile

Use same percentile formula. The interquartile range, designated by IQR, is defined as
follows:
IQR = Q3 − Q1
The interquartile range shows the spread of the middle 50% of the data and is not affected
by Extremes (outliers) in the data set.
Box-and-whiskers displays (box plots) - A box-and-whiskers display (sometimes called
a box plot) is constructed by using Q1, Median, Q3, and the interquartile range. The box
contains the middle 50 percent of the data set. Next a vertical line is drawn through the box
at the value of the median. This line divides the data set into two roughly equal parts. The
lower and upper limits are also used to identify outliers. An outlier is a measurement that is
separated from (that is, different from) most of the other measurements in the data set. A
measurement that is less than the lower limit or greater than the upper limit is considered
to be an outlier. This line divides the data set into two roughly equal parts. We next define
what we call the lower and upper limits. The lower limit is located 1.5 × IQR below Q1
and the upper limit is located 1.5 × IQR above Q3. For the satisfaction ratings data, these
limits are:
Q1 − 1.5(IQR) and Q3 − 1.5(IQR)

15
Household size

region
1
2
10
3

Region

Figure 4: Box and Whisker for Household size by Region

14
d. Quantiles

When the data is sorted in ascending order and is divided into five equal categories each
containing 20 percent of the data.

Example

Find the ninety-fifth percentile, the seventh decile, and the first quartile for the age distri-
bution given in Table below.
Age of Second year ODL Economics students

20 24 26 30
22 24 27 30
22 24 28 33
24 25 29 34

Solution
np (16×95)
• To find P95 , compute i = 100
= 100
= 15. The 95th percentile is 16th observation
in the arranged dataset.
np
• To find the 7th decile (same as P70 ), compute i = 100 = (16×70)
100
= 11.2. The 7th decile
is the 12th observation in the arranged dataset. Thus, 70% of the students are below
the age of 30.
np
• To find the first quartile (Q1 ), compute = 100 = (16×25)
100
= 4. The first quartile is the
average of the observations in positions 4 and 5 in the ranked data set. Or the average
of 24 and 24 which is 24.

2.3.3 Measurement of variation (dispersion)

In addition to measures of central tendency, it is desirable to have numerical values to

describe the spread or dispersion of a data set. Measures that describe the spread of a data
set are called measures of dispersion.

1. Range: difference between largest and smallest observation. For data table shown
above, the range is given by 34 - 20.
2. Interquartile range: overcomes the dependency on extreme values. Denoted IQR
The interquartile range for the students age in Table 2.7 is found by subtracting the
value of Q1 from Q3 . The first quartile is equal to the 25th percentile and is found
to be observation in fifth position which is 24. The third quartile is equal to the 75th
percentile and is found by noting that (16×75)
100
= 12 and therefore i = 13 and the age is
30. Q1 is in the 4th position in Table 2.7 and Q3 = 13. The IQR equals 30 - 24 years
or 6 years.

15
Statistics for Economists 1

3. Variance: measure of variability that uses all data. It is squared deviations from the
mean divided by the number of observations. Population variance is denoted by σ 2
(sigma squared). Sample variance is denoted by s2 .
P P
(x−x̄)2 (x−x̄)2
Sample variance: σ 2 = N
Population variance: s2 = n−1

4. Standard deviation: It is the positive square root of variance

rP
(x−x̄)2
i. Population standard deviation is: σ = N
rP
(x−x̄)2
ii. Sample standard deviation is: s = n−1

5. Coefficient of variation: The coefficient of variation (CV) is the ratio of the standard
deviation to the mean and shows the extent of variability in relation to the mean of
the population. The higher the CV, the greater the dispersion.

CV = Standard Deviation/Mean

The standard deviation is useful as a measure of variation within a given set of data.
When one desires to compare the dispersion in two sets of data, however, comparing
the two standard deviations may lead to fallacious results. It may be that the two
variables involved are measured in different units. For example, we may wish to know,
for a certain population, whether serum cholesterol levels, measured in milligrams per
100 ml, are more variable than body weight, measured in kgs. Furthermore, although
the same unit of measurement is used, the two means may be quite different. If we
compare the standard deviation of weights of first-grade children with the standard
deviation of weights of high school freshmen, we may find that the latter standard
deviation is numerically larger than the former, because the weights themselves are
larger, not because the dispersion is greater.

Example

The times required in minutes for students to solve a particular math problem were 5, 10,
15, 3, and 7. Calculate the standard deviation.

Solution

The mean time for the five preschoolers is 8 minutes. Table below illustrates the computation
indicated by formula. The first column lists the observations, x. The second column lists the
deviations from the mean, x − x̄. The third column lists the squares of the deviations. The
sum at the bottom of the second column is called the slim of the deviations, and is always
equal to zero for any data set. The sum at the bottom of the third column is referred to as
the sum of the squares of the deviations. The sample variance is obtained by dividing the
sum of the squares of the deviations by n − 1, or 5 − 1 = 4. The sample variance equals 88
divided by 4 which is 22 minutes squared.

16
x x − x̄ (X − x̄)2
5 5-8 (−3)2 = 9
10 10 - 8 (2)2 = 4
15 15 - 8 (7)2 = 49
3 3-8 (−5)2 = 25
7 7-8 (−1)2 = 1
(x − x̄) = 0 (x − x̄)2 = 88
P P

2.3.4 Measures of distribution (shape)

a. Kurtosis

Just as we may describe a distribution in terms of skewness, we may describe a distribution

in terms of kurtosis. Kurtosis is a measure of the degree to which a distribution is “peaked”
or flat in comparison to a normal distribution whose graph is characterized by a bell-shaped
appearance. A distribution, in comparison to a normal distribution, may possesses an exces-
sive proportion of observations in its tails, so that its graph exhibits a flattened appearance.
Such a distribution is said to be platykurtic. Conversely, a distribution, in comparison to a
normal distribution, may possess a smaller proportion of observations in its tails, so that its
graph exhibits a more peaked appearance. Such a distribution is said to be leptokurtic. A
normal, or bell-shaped distribution, is said to be mesokurtic. Kurtosis can be expressed as

(x − x̄)4
P
n
kurtosis = P −3=
(x − x̄)3

Comparison of normal and t distribution

0.4

Normal
t−distribution
0.3
n_dist

0.2
0.1
0.0

−4 −2 0 2 4

Figure 5: Leptokurtic vs Mesokurtic

As noted, normal or symmetric distribution is a good example of a Mesokurtic distribution

and in this plot it is depicted with a black density plot. T-distribution is a good example of
a Leptokurtic distribution as it is slender and has fatter tails than the normal distribution.

17
b. Skewness

The shape of the distribution can either be skewed to the left or to the right. Skewness of
given by:

mean - Mode
Skewness =
Standard deviation

0.04

0.03

type
Left skewed
0.02
y

Normal
Right skewed

0.01

0.00

0 25 50 75 100
x

Figure 6: Skewness in data distribution

Chebyshev’s Theorem

If we fear that the Empirical Rule does not hold for a particular population, we can con-
sider using Chebyshev’s Theorem to find an interval that contains a specified percentage of
the individual measurements in the population. Although Chebyshev’s Theorem technically
applies to any population, we will see that it is not as practically useful as we might hope.
Chebyshev’s theorem states that the fraction of any data set lying within k standard devia-
tions of the mean is at least 1 − 1/k 2 . Where k is a number greater than 1. For example, if
we choose k equal to 2, then at least 100(1 − 1/22 )% = 100(3/4)% = 75% of the population
measurements lie in the interval [m 2s]. As another example, if we choose k equal to 3, then
at least 100(1 − 1/32 )% = 100(8/9)% = 88.89% of the population measurements lie in the
interval [m 3s].
The theorem applies to either a sample or a population. The implication of the theorem
within standard deviations is that:

a. At least 0.75 or 75% of data values must be within standard deviations of the mean.

18
b. At least 0.89 or 89% of the data values must be within standard deviations of the
mean.

c. At least 0.94 or 94% of the data values must be within standard deviations of the
mean.

Empirical Rule

A practical interpretation of the standard deviation: The Empirical Rule One type of rel-
ative frequency curve describing a population is the normal curve. The normal curve is a
symmetrical, bell-shaped curve. If a population is described by a normal curve, we say that
the population is normally distributed, and the following result can be shown to hold.

1. 68.26 percent of the population measurements are within (plus or minus) one standard
deviation of the mean and thus lie in the interval
2. 95.44 percent of the population measurements are within (plus or minus) two standard
deviations of the mean and thus lie in the interval
3. 99.73 percent of the population measurements are within (plus or minus) three standard
deviations of the mean and thus lie in the interval

Activity

1. Table 2.8 gives the ages of cars randomly selected from Lilongwe Civil servants. Find
the percentiles for the ages 10, 15, and 20.
Table 2.8
2 7 11 15 19
2 7 11 15 19
2 7 12 15 20
2 7 12 15 20
4 7 12 15 20
4 10 14 15 22
4 10 14 16 24
4 10 14 16 25
5 10 14 17 25
5 10 15 17 27

Solution

The age 10 is the thirtieth percentile. The age 15 is the fifty-eighth percentile. The
age 20 is the eighty-fourth percentile.

2. Find P90 , D8 , and Q3 for the civil servant cars’ ages in Table 2.8.

19
Solution

P90 = 21, D8 = 18, and Q3 = 16

Reading assignment (Reading list)

Anderson, D. R., Sweeney, D.J, Williams, T.A (2002) Essentials of Statistics for Business
and Economics. 2nd Edition. South Western College Publishing. \ Kazmier, L.J.,(2004).
Business Statistics. Schaum’s Outlines. DOI: 10.1036/0071430997

Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
Data Organization
No ratings yet
Data Organization
69 pages
Week 1 - CH 2
No ratings yet
Week 1 - CH 2
49 pages
Topic 3
No ratings yet
Topic 3
22 pages
Lecture-02 Data Organization and Presentation
No ratings yet
Lecture-02 Data Organization and Presentation
36 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Describing Data - Frequency Distribution
No ratings yet
Describing Data - Frequency Distribution
15 pages
What Is Statistics
No ratings yet
What Is Statistics
147 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
65 pages
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
No ratings yet
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
46 pages
EMBA Day3
No ratings yet
EMBA Day3
29 pages
Statistics Lec 2
No ratings yet
Statistics Lec 2
25 pages
Describing Data With Tables
No ratings yet
Describing Data With Tables
9 pages
Graphical Representations and Frequency Distribution
No ratings yet
Graphical Representations and Frequency Distribution
12 pages
Organizing and Graphing Data
No ratings yet
Organizing and Graphing Data
83 pages
2.data Presentation
No ratings yet
2.data Presentation
26 pages
Lekcija 3 - Frekvencije
No ratings yet
Lekcija 3 - Frekvencije
57 pages
STA112 Week 2 Class Note
No ratings yet
STA112 Week 2 Class Note
102 pages
CH - 2 (Organizing and Graphing Data)
No ratings yet
CH - 2 (Organizing and Graphing Data)
83 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
Lectures 3 - 6 - 2017
No ratings yet
Lectures 3 - 6 - 2017
94 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
Chapter 2, Part A Descriptive Statistics
No ratings yet
Chapter 2, Part A Descriptive Statistics
5 pages
Lecture No 6
No ratings yet
Lecture No 6
8 pages
Lecture 7 Quantitative Reasoning
No ratings yet
Lecture 7 Quantitative Reasoning
7 pages
Frequency Distribution & Graghs
No ratings yet
Frequency Distribution & Graghs
28 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
31 pages
MODULE IN STATISTICS Frequency Distribution and Graph
No ratings yet
MODULE IN STATISTICS Frequency Distribution and Graph
13 pages
ST1009 Week2
No ratings yet
ST1009 Week2
24 pages
Stat 166 Part 2
No ratings yet
Stat 166 Part 2
28 pages
Lecture-2 & 3
No ratings yet
Lecture-2 & 3
94 pages
PDF Document
No ratings yet
PDF Document
28 pages
Lecture-3 Frequency Distribution
No ratings yet
Lecture-3 Frequency Distribution
22 pages
STUDY94@817302
No ratings yet
STUDY94@817302
18 pages
Screenshot 2025-02-20 at 1.50.52 PM
No ratings yet
Screenshot 2025-02-20 at 1.50.52 PM
39 pages
ch2 22092024 104300am
No ratings yet
ch2 22092024 104300am
97 pages
Lecture 4 - Graphing Data Adjusted
No ratings yet
Lecture 4 - Graphing Data Adjusted
5 pages
Stat 153 Unit 2b
No ratings yet
Stat 153 Unit 2b
63 pages
Part 1 Descriptive
No ratings yet
Part 1 Descriptive
42 pages
2 Frequency Distribution and Graphs
0% (1)
2 Frequency Distribution and Graphs
4 pages
Chapter 2-190810 074149
No ratings yet
Chapter 2-190810 074149
19 pages
Chapter 2 SUMMARY Descriptive Statistics
No ratings yet
Chapter 2 SUMMARY Descriptive Statistics
32 pages
Group 2 Descriptive Statistics
No ratings yet
Group 2 Descriptive Statistics
27 pages
Stat 02
No ratings yet
Stat 02
62 pages
Organizing and Graphing Data - Francheska G. Alviz
No ratings yet
Organizing and Graphing Data - Francheska G. Alviz
13 pages
Stat Module 2
No ratings yet
Stat Module 2
57 pages
Tabular and Graphical Presentation of Data1
100% (1)
Tabular and Graphical Presentation of Data1
7 pages
Frequency Distribution:: Tabular Presentation For Qualitative Data
No ratings yet
Frequency Distribution:: Tabular Presentation For Qualitative Data
4 pages
Frequency Distributio2
No ratings yet
Frequency Distributio2
12 pages
L1 Descriptive Stats
No ratings yet
L1 Descriptive Stats
149 pages
BADB1014 Quantitative Methods - Lesson 3
No ratings yet
BADB1014 Quantitative Methods - Lesson 3
23 pages
Unit 2
No ratings yet
Unit 2
11 pages
MATH 101 - Data Management
No ratings yet
MATH 101 - Data Management
44 pages
Chapter 2
No ratings yet
Chapter 2
24 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
63 pages
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Chapter Four Findings and Discussion
No ratings yet
Chapter Four Findings and Discussion
7 pages
16 ACTL2131 Exercises
No ratings yet
16 ACTL2131 Exercises
94 pages
Q Bank (Ed-8) of Maths IV N
100% (1)
Q Bank (Ed-8) of Maths IV N
29 pages
1-S2.0-S0378377421006557-Main Vishwakarma
No ratings yet
1-S2.0-S0378377421006557-Main Vishwakarma
22 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
36 pages
Biostatistics Notes Introductory Chapter
No ratings yet
Biostatistics Notes Introductory Chapter
21 pages
Econ G2 Final
No ratings yet
Econ G2 Final
10 pages
Fin534 Individual Assignment 1
No ratings yet
Fin534 Individual Assignment 1
30 pages
The Impact of The Negotiators' Personality and Socio-Demographic Factors On Their Perception of Unethical Negotiation Tactics.
No ratings yet
The Impact of The Negotiators' Personality and Socio-Demographic Factors On Their Perception of Unethical Negotiation Tactics.
31 pages
Education and Nigeria Economic Growth Nexus A VECM Approach
No ratings yet
Education and Nigeria Economic Growth Nexus A VECM Approach
12 pages
DGMOLKe
No ratings yet
DGMOLKe
17 pages
Chapter Six - Processing, Analyzing and Interpretation of Data
No ratings yet
Chapter Six - Processing, Analyzing and Interpretation of Data
56 pages
F. Y. B. Sc. (Computer Science) Examination - 2010: Total No. of Questions: 5) (Total No. of Printed Pages: 4
No ratings yet
F. Y. B. Sc. (Computer Science) Examination - 2010: Total No. of Questions: 5) (Total No. of Printed Pages: 4
76 pages
GRR 7501 Learner Centered Activity Problem Set 5
No ratings yet
GRR 7501 Learner Centered Activity Problem Set 5
2 pages
Moments and Measures of Skewness and Kurtosis
0% (1)
Moments and Measures of Skewness and Kurtosis
2 pages
Comparison Among The EMG Activity of The
No ratings yet
Comparison Among The EMG Activity of The
4 pages
Statistics Course Outline - Uttam Golder
No ratings yet
Statistics Course Outline - Uttam Golder
2 pages
FYBCA Syllabus
No ratings yet
FYBCA Syllabus
37 pages
Yogurt Production Line - Reliability Analysis
No ratings yet
Yogurt Production Line - Reliability Analysis
14 pages
KOntario Gateway
No ratings yet
KOntario Gateway
426 pages
Review of Grain Size Parameters - Folk, R.L
100% (1)
Review of Grain Size Parameters - Folk, R.L
21 pages
Measuring Skewness - Forgotten Statistics
No ratings yet
Measuring Skewness - Forgotten Statistics
18 pages
Psych Assess Notes
No ratings yet
Psych Assess Notes
10 pages
Advance Statistics For Data Science and Data Analysis
No ratings yet
Advance Statistics For Data Science and Data Analysis
47 pages
The Role of Female Education in Economic
No ratings yet
The Role of Female Education in Economic
12 pages
Data Science Question Bank Updated
No ratings yet
Data Science Question Bank Updated
15 pages
Grain Size Analysis of Beach Sediments From Bonny Beach in The Niger Delta
No ratings yet
Grain Size Analysis of Beach Sediments From Bonny Beach in The Niger Delta
13 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Cyber Crime Awareness Among Pupil Teachers of Punjab
No ratings yet
Cyber Crime Awareness Among Pupil Teachers of Punjab
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

PLU Quantitative Techniques 2

Uploaded by

PLU Quantitative Techniques 2

Uploaded by

ECON-3202: Quantitative Techniques

2.1.1 Summarizing qualitative data

Table 1: Frequency of interviews reached by region

Table 2: Relative Frequency of interviews reached by re-

Region Relative Frequency

Table 4: Relative frequency distribution table

2.1.2 Summarizing quantitative data

Frequency distribution for quantitative data

Table 5: Test scores for LUANAR Entrance exams

Test score Frequency

Test score Frequency

Class limit Class boundaries Class width Class mark

2.1.3 General rules for forming frequency distributions

2.3, 1.9, 1.1, 3.2, 2.7, 1.5, 0.7, 2.5,

Figure 1: Dot Plot of Scores on Statistics for Economists Test 1

2.2 Exploratory data analysis

Figure 2: Household size histogram plot

The decimal point is at the |

The decimal point is 1 digit(s) to the right of the |

• Easy to construct by hand

• Gives more information than histogram because it has actual data

2.2.2 pie chart

2.2.3 Contingency Tables

Table 8: Contingency Table

Access to Credit No Access to Credit Total

Simpsons paradox: Exercise caution in interpreting cross tabulations. In some cases

2.2.4 Scatter plots

District Advert Expenditure, x Sales Volume, y

To construct a scatter plot, we place the variable advertising expenditure (denoted x) on

`geom_smooth()` using formula 'y ~ x'

5.0 7.5 10.0 12.5

Sometimes a scatter plot is used to demonstrate a negative or inverse relationship between

2.3 Descriptive measures of data

2.3.2 Measures of position

1. Arrange all observations in ascending order.

Quartiles divide the data into four parts equal parts.

• 1st quartile = 25th percentile

• 2nd quartile = 50th percentile

• 3rd quartile = 75th percentile

Figure 4: Box and Whisker for Household size by Region

2.3.3 Measurement of variation (dispersion)

In addition to measures of central tendency, it is desirable to have numerical values to

4. Standard deviation: It is the positive square root of variance

2.3.4 Measures of distribution (shape)

Just as we may describe a distribution in terms of skewness, we may describe a distribution

Comparison of normal and t distribution

Figure 5: Leptokurtic vs Mesokurtic

As noted, normal or symmetric distribution is a good example of a Mesokurtic distribution

Figure 6: Skewness in data distribution

P90 = 21, D8 = 18, and Q3 = 16

Reading assignment (Reading list)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.