0% found this document useful (0 votes)

15 views10 pages

STAE lecture notes_LU3_Annotated

Learning Unit 3 focuses on descriptive statistics, covering measures of central tendency (mean, median, mode) and variability (range, interquartile range, variance, standard deviation, coefficient of variation). It explains how to calculate these measures using raw and frequency data, and emphasizes the importance of selecting appropriate measures based on data characteristics. Additionally, it introduces percentiles and their interpretation in relation to data distribution.

Uploaded by

michaeljxmes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views10 pages

STAE lecture notes_LU3_Annotated

Uploaded by

michaeljxmes

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Learning Unit 3: DESCRIPTIVE STATISTICS

LEARNING OBJECTIVES
• Understand the concepts of and calculate the mean, median, mode and percentiles
• Understand the concepts of and calculate the range, interquartile range, standard deviation, variance
and coefficient of variation
• Choose the appropriate measures of central tendency and variability for any given variable

Textbook reference: Chapter 1

3.1. Introduction
Descriptive statistics are numerical summary measures used to describe the data collected from a sample in
terms of central tendency, variability, skewness and kurtosis. These measures are used in most statistical
analyses. In this course, measures of central tendency and variability are calculated using raw and frequency
data, skewness is only evaluated visually, and kurtosis is not assessed.

3.2. Measures Of Central Tendency

A measure of central tendency (or location) is a single value that summarises the centre of a distribution. The
commonly used measures of central tendency are mode, median and mean.

3.2.1. Mode
For raw data and ungrouped frequency data the mode is the value(s) of the variable that occur(s) most
frequently. A variable can have one, two, more than two, or no mode.
• Unimodal = one mode
• Bimodal = two modes
• Multimodal = more than two modes

For grouped frequency data it is not possible identify the most frequent value(s) since the data were grouped
into class intervals and information was lost. For such data formats the class(es) with the highest frequency
is/are the modal class and the mode is generally estimated using the midpoint of the modal class(es).

3.2.2. Median
The median is the value of the variable in the middle of the ordered set of data values. Therefore, at most 50%
of observations are below the median value, and at most 50% of observations are above the median value.

1
To find the median for raw data
• Order the data from lowest to highest
n +1
• Find the median position =
2
• If n is odd, the median position value will be a whole number
o The median value is the value of the variable in the median position of the ordered data
o For example, for the ordered observations: 3 4 6 9 13
5 +1 6
o Since n = 5 the median position = = =3
2 2
o The value in position 3 of the ordered data is 6, i.e., median = 6
• If n is even, the median position value will be a fraction
o The median value is the average of the two variable values on either side of the median position in
the ordered data
o For example, for the ordered observations: 3 4 6 9
4 +1 5
o Since n = 4, the median position = = = 2.5
2 2
o The value in position 2 of the ordered data is 4, and the value in position 3 of the ordered data is 6,
4 + 6 10
i.e., median = = =5
2 2

For ungrouped frequency tables the median is calculated using cumulative frequencies. For grouped frequency
tables the median is estimated using cumulative frequencies and an interpolation formula. However, this is
beyond the scope of this course.

3.2.3. Mean
The mean of a variable is also referred to as the arithmetic mean or the average. For raw data the mean is
calculated by adding all the values of the variable together and dividing by the total number of observations.

For a random variable X, the population mean is denoted by the Greek letter  (mu):
1
=
N
x
For a random variable X, the sample mean is denoted by x (x-bar):
1
x=
n
x

2
For example, consider the random sample with observations: 9 2 4 13 6
9 + 2 + 4 + 13 + 6 34
x= = = 6.8
5 5

For an ungrouped frequency table, the mean is calculated using a formula based on the values of the variable
and the frequency of occurrence. For a grouped frequency table, the mean is estimated using a formula based
on the midpoint of the class intervals and the frequency of occurrence. For the purpose of this course, it is
sufficient to calculate/estimate the mean from frequency tables using the calculator.

Steps to find the mean using the calculator

1) Enter data
2) AC
3) STAT →4:VAR → 2: x → =

3.2.4. Concluding notes

Advantages Disadvantages
Mode Valid for categorical and numerical data
More than one mode can exist
Not affected by outliers
Median Not affected by outliers Only appropriate for ordinal and numerical
Best measure to use for skewed data data
Mean Calculated using every value in the dataset, Affected of extreme values
i.e., very accurate Only appropriate for numerical data
Best measure to use for symmetrical data

3.3. Measures Of Relative Standing

Measures of relative standing show where particular values stand relative to the whole distribution of the
variable. Relative standing is measured through percentiles. Percentiles are points which partition an ordered
dataset into a hundred parts. The rth percentile, Pr, is the value of the variable that separates the lowest r% of
the distribution from the remaining (100 – r)% of the distribution. The formal interpretation of Pr is: at most
r% is less than Pr and at most (100 – r)% is more than Pr.

For example, if 10% of students scored at least 80 on a test, then a student who scored 82 performed in the
top 10% of the distribution. The value “80” is the minimum value obtained by the top 10% of the distribution
and is therefore the 90th percentile, i.e., P90 = 80, as it separates the lowest 90% from the remaining 10% of
the distribution. Therefore, at most 90% of students scored less than 80 and at most 10% of students scored
more than 80.
3
Recall the interpretation of the median, namely at most 50% of observations are below the median value and
at most 50% of observations are above the median value. The median of a distribution is the 50 th percentile
value, i.e., P50 = median. Other commonly used percentiles are deciles, which divide the distribution into ten
equal parts (D 1 , D2 , …, D 10 ) and quartiles, which divide the distribution into four equal parts (Q 1 , Q 2 , Q 3 , Q 4).
Both deciles and quartiles can be expressed in terms of percentiles. For example, D5 = Q2 = P50 = median. For
raw data any percentile value is obtained by first sorting the data from lowest to highest, locating the percentile
position and then using a formula to calculate the percentile value. Percentile calculation from frequency data
is beyond the scope of this course.

To find Pr for raw data:

• Order the data from lowest to highest
r
• Find the percentile position = ( n + 1)
100
o This yields a value in the format k.d, where k = the integer portion and d = the decimal portion (in
decimal format)

(
• Pr = x( k ) + d x( k +1) − x( k ) )
o Where x( k ) is the value in position k of the ordered dataset

o Where x( k +1) is the value in position (k + 1) of the ordered dataset

For example, find and interpret P20 and Q 3 for the following 12 observations (already ordered):
4 5 8 9 11 12 12 14 15 17 19 21
• P20
20
o Position = (12 + 1) = 2.6 , Therefore k = 2 and d = 0.6
100
o The value in position 2 (k) is 5 and the value in position 3 (k + 1) is 8

( )
o P20 = x( 2) + 0.6 x(3) − x( 2) = 5 + 0.6 (8 − 5) = 6.8

o At most 20% of observations are less than 6.8 and at most 80% of observations are greater than 6.8

• Q3 = P75
75
o position = (12 + 1) = 9.75 , Therefore k = 9 and d = 0.75
100
o The value in position 9 (k) is 15 and the value in position 10 (k + 1) is 17

( )
o P75 = x(9) + 0.75 x(10) − x(9) = 15 + 0.75 (17 − 15) = 16.5

o At most 75% of observations are less than 16.5 and at most 25% of observations are greater than 16.5

4
3.4. Measures Of Variability
Measures of variability (or spread or dispersion) describe the extent to which data are spread around its central
tendency and across the scale. The commonly used measures of variability are range, interquartile range,
variance, standard deviation and coefficient of variation.

3.4.1. Range
The range is an approximate measure of variability and shows how much of the scale is utilised. For raw data
and ungrouped frequency data the range is the difference between the maximum and the minimum values of
a variable. For grouped frequency data the range is the difference between the upper limit of the last class
interval and the lower limit of the first class interval.
Range = maximum – minimum

3.4.2. Interquartile range

The interquartile range (IQR) is the distance between the 1st and 3rd quartiles. It gives a measure of how the
middle 50% of the distribution is spread around the median.
IQR = Q 3 – Q1

3.4.3. Average deviation

The average deviation is the arithmetic mean of the differences between each observation and the mean of the
variable.
1
Average deviation =
n
( x − x )

Consider the observations: 2 2 1 3

1
x =2 → (x − x): 0 0 1 −1 →
4
( x − x ) = 0
Because the mean is the arithmetic centre of the distribution, some observations are less than the mean (i.e.,
negative difference) and some observations are greater than the mean (i.e., positive difference). The negative
and positive values completely cancel out across all observations. The sum of the differences is always equal
to zero, making this a redundant measure of variability and only serves as an introduction or starting point to
measure how data are spread around the mean.

5
3.4.4. Variance
To solve the problem encountered with the average deviation measure, differences are considered as distances
which must always be positive. There are two ways in which negative values can be removed: either take the
absolute value (i.e., remove the sign), or square the value. The variance is the average squared deviation around
the mean. It is the most commonly used measure of variability in statistics. The larger the value of the variance
the more the data values vary around the mean and the greater the spread of the data. The variance is expressed
in the squared unit of measurement of a variable, which is of no practical value and is difficult to interpret.

The population variance is denoted by the Greek letter  2 (sigma-squared) and is calculated as follows:
1
2 =  (x − )
2

The sample variance is denoted by the Roman letter s 2 (s-squared) and is calculated as follows:

n x 2 − (  x )
2
1
s =  ( x − x ) = n ( n − 1)
2 2

n −1

For an ungrouped frequency table, the variance is calculated based on the values of the variable and the
frequency of occurrence. For a grouped frequency table, the variance is estimated using a formula based on
the midpoint of the class intervals and the frequency of occurrence. For the purpose of this course, it is
sufficient to calculate/estimate the variance from frequency tables using the calculator. The calculator gives
the population and sample standard deviations, which must be squared to obtain the variance. Steps to perform
calculations are discussed in Section 3.4.5.

3.4.5. Standard deviation

The standard deviation is the positive square root of the variance. It is expressed in terms of the unit of
measurement of the variable. Under certain distributional assumptions, the standard deviation has a very
particular and practical interpretation (further detail is given in Section 5.3.4).

The population standard deviation is denoted by the Greek symbol  (sigma) and is calculated as follows:
1
=  (x − )
2

6
The sample standard deviation is denoted by the Roman letter s and is calculated as follows:

n x 2 − (  x )
2
1
s= ( x − x ) =
2

n −1 n ( n − 1)

Steps to find the standard deviation using the calculator

1) Enter data
2) AC
3) STAT →4:VAR → 3:  x → = (population standard deviation)
STAT →4:VAR → 4: sx → = (sample standard deviation)

3.4.6. Coefficient of variation

The coefficient of variation (CV) is a measure of relative variability and is used to compare the variability of
different variables measured in different units or to compare the variability of the same variables measured at
different times.

The CV is the ratio of the standard deviation to the mean, expressed as a percentage, i.e., the variability in the
variable is expressed as a percentage of the mean of that variable. This measures variability on comparable
scales for multiple variables. Note, this value is not bounded by 100% and can be greater than 100%, which
implies more variability.

The sample coefficient of variation is calculated as follows:

s
CV = 100
x

3.4.7. Concluding notes

Advantages Disadvantages
Range Easy to calculate Affected by extreme values
Interquartile range Not affected by outliers
Best measure of variability for Does not utilise all the data
skewed data
Average deviation None Always zero
Variance/Standard deviation Uses all available data
Best measure of variability for Affected by outliers
symmetrical data
Coefficient of variation Best measure of relative variability Affected by outliers

7
Exercise 3.1
The sums for X = coffee consumption are  x = 59 and  x 2
= 251 . The following table shows the frequency

distribution for coffee consumption. Calculate the mean, range, variance, standard deviation and coefficient
of variation using the computational formulae as well as the calculator. Compare the results.
Coffee consumption Frequency
1 5
2 6
3 3
4 2
5 2
6 0
7 1
8 1
Total 20

From table
Mean =
Range =
Variance =
Standard deviation =
Coefficient of variation =

From sums:  x = 59 ,  x 2
= 251

1
Mean = x =
n
x

n x 2 − (  x )
2

Variance = s = 2

n ( n − 1)

Standard deviation = s =
s
Coefficient of variation = 100
x

Comparison
8
Exercise 3.2
Use the raw data for the coffee affinity score as well as the grouped frequency table to calculate the mean,
range, variance, standard deviation and coefficient of variation. Compare the results.
Coffee affinity score Frequency Midpoint
(0, 1] 7
(1, 2] 4
(2, 3] 2
(3, 4] 4
(4, 5] 3
Total 20

From table
Mean =
Range =
Variance =
Standard deviation =
Coefficient of variation =

From raw data

0.1 0.2 0.4 0.4 0.6 0.8 1.0 1.4 1.8 1.9 1.9 2.3 2.4 3.1 3.1 3.4 3.6 4.4 4.6 4.9
Mean =
Range =
Variance =
Standard deviation =
Coefficient of variation =

Comparison

9
Exercise 3.3
Use the following stem-and-leaf plot of age (leaf unit = 1) and calculate the mode(s), median, D 2 and IQR

Stem Leaf
1 9 9 9
2 1 4 4 5 6 6 8 9 9
3 0 2 4 5 6 7
4 0 3

From raw data

Mode(s)

Median

P25

P75

IQR

Mean, Median, Mode, Standard Deviation (Descriptive Statistics)
No ratings yet
Mean, Median, Mode, Standard Deviation (Descriptive Statistics)
43 pages
Probability and Statistics Lecture Notes
100% (1)
Probability and Statistics Lecture Notes
9 pages
Data Analytics TB
No ratings yet
Data Analytics TB
1,944 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
27 pages
Quantitative Methods For Management
No ratings yet
Quantitative Methods For Management
118 pages
Unit 4 - Descriptive Statistics (A)
No ratings yet
Unit 4 - Descriptive Statistics (A)
19 pages
PHARMACOLOGY OF THE CARDIOVASCULAR SYSTEM MCQS COMPILED BY KANDY EMMA MBChB KIU
100% (1)
PHARMACOLOGY OF THE CARDIOVASCULAR SYSTEM MCQS COMPILED BY KANDY EMMA MBChB KIU
52 pages
2Review on Measurement on Descriptive Statistics
No ratings yet
2Review on Measurement on Descriptive Statistics
76 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
EDA_W3_Obtaining-Data
No ratings yet
EDA_W3_Obtaining-Data
57 pages
20 - Levels of Measurement, Central Tendency Dispersion
No ratings yet
20 - Levels of Measurement, Central Tendency Dispersion
35 pages
Biostatistics3-2
No ratings yet
Biostatistics3-2
36 pages
2 Stats Intro 14022024 105150am
No ratings yet
2 Stats Intro 14022024 105150am
19 pages
Statistical Analysis
No ratings yet
Statistical Analysis
15 pages
Lec1 Statistics
No ratings yet
Lec1 Statistics
30 pages
ROXII v2.13 RX1500 User-Guide CLI EN PDF
No ratings yet
ROXII v2.13 RX1500 User-Guide CLI EN PDF
892 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
Dracula
No ratings yet
Dracula
219 pages
MATH& 146 Lesson 8: Averages and Variation
No ratings yet
MATH& 146 Lesson 8: Averages and Variation
30 pages
Cheng Nien Fang 2023 Thesis
No ratings yet
Cheng Nien Fang 2023 Thesis
93 pages
Share MBBS- Lecture 4 (1)-1
No ratings yet
Share MBBS- Lecture 4 (1)-1
68 pages
Stat I Chapter 3
No ratings yet
Stat I Chapter 3
48 pages
EECM3724_Unit_1_Ch3_slides_2022
No ratings yet
EECM3724_Unit_1_Ch3_slides_2022
48 pages
Johnson & Johnson
No ratings yet
Johnson & Johnson
80 pages
Lecture_04
No ratings yet
Lecture_04
88 pages
Group-4-Data-Management-Notes
No ratings yet
Group-4-Data-Management-Notes
21 pages
CH03 - Descriptive Statistics 2
No ratings yet
CH03 - Descriptive Statistics 2
67 pages
Lecture 2-Descriptive Statistics
No ratings yet
Lecture 2-Descriptive Statistics
74 pages
Bio Statistics 3
No ratings yet
Bio Statistics 3
13 pages
Business Statistics
No ratings yet
Business Statistics
106 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
38 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
35 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
Measures-of-Centrality-and-Variability
No ratings yet
Measures-of-Centrality-and-Variability
42 pages
Chapter 5
No ratings yet
Chapter 5
6 pages
Jes2 Spool
No ratings yet
Jes2 Spool
45 pages
FDSA unit 2
No ratings yet
FDSA unit 2
44 pages
Business Statistics CH (7)
No ratings yet
Business Statistics CH (7)
37 pages
Statistics
100% (4)
Statistics
124 pages
Unit 4 - Statistics
No ratings yet
Unit 4 - Statistics
52 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
David J.A. Clines, Deconstructing Job
No ratings yet
David J.A. Clines, Deconstructing Job
16 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Module-3-4-MMW
No ratings yet
Module-3-4-MMW
6 pages
dddddd2
No ratings yet
dddddd2
5 pages
Gambar Flow Proses Boiler Dan Turbin CFK# 1&2
100% (1)
Gambar Flow Proses Boiler Dan Turbin CFK# 1&2
2 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
Statistics Midterm Review
No ratings yet
Statistics Midterm Review
21 pages
R3.Descriptive Statistics
No ratings yet
R3.Descriptive Statistics
5 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
No ratings yet
Lesson 4: Statistics/Data Management Unit 1 - Measures of Central Tendency
26 pages
Quantitative Methods For Decision Making: Dr. Akhter
No ratings yet
Quantitative Methods For Decision Making: Dr. Akhter
100 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Stench of Kerosene 21.11.2020
100% (3)
Stench of Kerosene 21.11.2020
8 pages
Measures
No ratings yet
Measures
8 pages
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
No ratings yet
Applied Statistical Methods (ASM) : "The True Logic of This World Is in The Calculus of Probabilities"
90 pages
Module 1 Overview_of_Statistics
No ratings yet
Module 1 Overview_of_Statistics
11 pages
Research AP Example
No ratings yet
Research AP Example
23 pages
CLD Project Files
No ratings yet
CLD Project Files
21 pages
Appendix e
No ratings yet
Appendix e
17 pages
1 Basics of Stat (Statistics IEM 2-2)
No ratings yet
1 Basics of Stat (Statistics IEM 2-2)
29 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Linux Commands
No ratings yet
Linux Commands
33 pages
Se Study On Tesla Motors: Analysis of The Business Model and Growth Strategy
No ratings yet
Se Study On Tesla Motors: Analysis of The Business Model and Growth Strategy
26 pages
How To Design A Logo of Letters
100% (17)
How To Design A Logo of Letters
10 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Chapter 3(Technical English for Statistics)
No ratings yet
Chapter 3(Technical English for Statistics)
8 pages
Pallavi Aware: Online Worksheet (
No ratings yet
Pallavi Aware: Online Worksheet (
2 pages
High Availability: Administration Guide
No ratings yet
High Availability: Administration Guide
59 pages
Business Statistics - Session Descriptive Statistics
No ratings yet
Business Statistics - Session Descriptive Statistics
28 pages
Kpi Analysis
No ratings yet
Kpi Analysis
30 pages
Introductory of Statistics - Chapter 3
No ratings yet
Introductory of Statistics - Chapter 3
7 pages
Elwood Parts and Service Manual
No ratings yet
Elwood Parts and Service Manual
75 pages
Note, Table. Flow-Chart Completion and Diagram Labelling
No ratings yet
Note, Table. Flow-Chart Completion and Diagram Labelling
6 pages
Sifchain White Paper
No ratings yet
Sifchain White Paper
6 pages
Sales Vs Advertisement Case Study
No ratings yet
Sales Vs Advertisement Case Study
14 pages
ITECH7409 Software Testing Assignment 1 - Individual
No ratings yet
ITECH7409 Software Testing Assignment 1 - Individual
6 pages
Term Paper On Wto
100% (1)
Term Paper On Wto
4 pages
Stat Chapter 5-9
No ratings yet
Stat Chapter 5-9
32 pages
Individual Case Study
No ratings yet
Individual Case Study
13 pages
Hand Signals For Hoist and Crane Operations
No ratings yet
Hand Signals For Hoist and Crane Operations
2 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
THC 4 Blended Activity
No ratings yet
THC 4 Blended Activity
2 pages
MTDS001103 Database Systems & Design
No ratings yet
MTDS001103 Database Systems & Design
2 pages
The Folded Earth by Anuradha Roy
No ratings yet
The Folded Earth by Anuradha Roy
5 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

STAE lecture notes_LU3_Annotated

Uploaded by

STAE lecture notes_LU3_Annotated

Uploaded by

Learning Unit 3: DESCRIPTIVE STATISTICS

Textbook reference: Chapter 1

3.2. Measures Of Central Tendency

Steps to find the mean using the calculator

3.2.4. Concluding notes

3.3. Measures Of Relative Standing

To find Pr for raw data:

o Where x( k +1) is the value in position (k + 1) of the ordered dataset

3.4.2. Interquartile range

3.4.3. Average deviation

Consider the observations: 2 2 1 3

3.4.5. Standard deviation

Steps to find the standard deviation using the calculator

3.4.6. Coefficient of variation

The sample coefficient of variation is calculated as follows:

3.4.7. Concluding notes

From raw data

From raw data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.