0% found this document useful (0 votes)
7 views38 pages

Measures of Dispersion

The document discusses measures of dispersion, which quantify the spread of data, and categorizes them into absolute and relative types. It covers various measures including range, variance, standard deviation, quartile deviation, mean deviation, coefficient of variation, and interquartile range, providing definitions, formulas, and examples for each. The document emphasizes the importance of these measures in understanding data variability and distribution.

Uploaded by

KHILY SAXENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views38 pages

Measures of Dispersion

The document discusses measures of dispersion, which quantify the spread of data, and categorizes them into absolute and relative types. It covers various measures including range, variance, standard deviation, quartile deviation, mean deviation, coefficient of variation, and interquartile range, providing definitions, formulas, and examples for each. The document emphasizes the importance of these measures in understanding data variability and distribution.

Uploaded by

KHILY SAXENA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

MEASURE OF

DISPERSION
DR. REKHA PRASAD
IM, BHU
INTRODUCTION
 Measures of dispersion are used to determine the spread of data. They
can be classified into two types: absolute (with the same units as the data)
and relative (no unit) measures of dispersion. Common measures include:
 Range
 Variance
 Standard deviation
 Quartile deviation
 Mean deviation
 Coefficient of variation
 Interquartile range (IQR)
RANGE
 The range in statistics for a given data set is the difference between the
highest and lowest values. For example, if the given data set is {2,5,8,10,3},
then the range will be 10 – 2 = 8.
 Range Formula: The formula of the range in statistics can simply be given
by the difference between the highest and lowest values.
PROBLEM
 EXAMPLE: Find the range of given observations: 32, 41, 28, 54, 35, 26, 23,
33, 38, 40.
 SOLUTION: Let us first arrange the given values in ascending order.
 23, 26, 28, 32, 33, 35, 38, 40, 41, 54
 Since 23 is the lowest value and 54 is the highest value, therefore, the
range of the observations will be;
 Range (X) = Max (X) – Min (X)
 = 54 – 23
 = 31
 Hence, 31 is the required answer.
CONT…
 In statistics, the range is a measure of the spread or variability of a dataset.
It represents the difference between the largest (maximum) and the smallest
(minimum) values in the dataset. Here's the formula:
 Range = Maximum value - Minimum value
 Key Points About Range:
 The range gives a quick idea of how spread out the data is but does not
account for the distribution or any outliers in the data.
 It's easy to calculate and interpret.
 While useful, it isn't always reliable as a sole measure of variability since
it depends only on the two extreme values.
VARIANCE
 In statistics, variance is a measure of how much the values in a dataset
deviate from their mean (average). In simpler terms, it shows how spread out
the data points are. A higher variance indicates more variability in the data,
while a lower variance indicates the data points are closer to the mean.
 Key Points About Variance:
 Variance is always non-negative since it's based on squared deviations.
 Units of variance are the square of the original data's units (e.g., if data is
in meters, variance is in square meters).
 It's a fundamental concept used to calculate the standard deviation
(which is the square root of the variance).
PROBLEM
 EXAMPLE: The marks obtained by 5 students in a test are: 10, 12, 14, 15,
and 19. Calculate the variance.
 SOLUTION:
 Find the mean (xˉ): (10+12+14+15+19)/5 = 70/5 = 14
 Find the squared deviations from the mean: For each data point xi,
calculate (xi−xˉ)2: (10-
14)2+ (12-14)2+(14-14)2+(15-14)2+(19-14)2 =
16+4+0+1+25=46
 Calculate the variance for sample data: 46/(5-1) = 46/4 = 11.5
 Final Answer:
 The variance of the sample is 11.5.
STANDARD DEVIATION
 Standard deviation is a widely used statistical measure that tells us how
much individual data points in a dataset differ from the mean. In simpler
terms, it gives an idea of how spread out the values are. A low standard
deviation means the data points are closer to the mean, while a high
standard deviation indicates greater variability.
 The standard deviation (s or σ) is simply the square root of the variance:
 Key Points:
 Standard deviation is always non-negative.
 Its units are the same as the original data (unlike variance, which is
squared).
 It is commonly used to understand the consistency or volatility in fields
like finance, science, and quality control.
PROBLEM
 EXAMPLE: The weights (in kg) of 5 people are: 60, 62, 65, 70, and 72. Calculate
the sample standard deviation.
 SOLUTION:
 Find the mean (xˉ):
 (60+62+65+70+72)/5 = 65.8
 Find the deviations from the mean and square them an then sum them:
 (60-65.8)2 + (62-65.8)2 + (65-65.8)2 + (70-65.8)2 + (72-65.8)2 = 104.8
 Divide by n−1: Since this is a sample, where n=5:
 104./4 = 26.2
 Take the square root to find the standard deviation:
 s=√26.2≈5.12
 Final Answer:
 The sample standard deviation is approximately 5.12 kg.
STANDARD DEVIATION
QUARTILE DEVIATION
 The quartile deviation, also known as the semi-interquartile range, is a
measure of statistical dispersion. It shows the spread of the middle 50% of
data in a dataset, providing an idea of variability while being less affected by
extreme values (outliers). It is calculated using the difference between the
third quartile (Q3) and the first quartile (Q1).
 Formula for Quartile Deviation:
CONT…
 Key Points:
 Quartile deviation focuses on the central portion of the dataset, making it
robust against outliers.
 It helps describe the consistency of data.
 Quartile deviation is often used in conjunction with other measures like
the range and standard deviation for a more complete understanding of
variability.
PROBLEM
 EXAMPLE: The following data represents the ages of 10 people: 12, 15, 18,
21, 22, 24, 27, 30, 33, 36. Calculate the quartile deviation.
 SOLUTION:
 Arrange the data in ascending order: The data is already arranged: 12,
15, 18, 21, 22, 24, 27, 30, 33, 36.
 Find the first quartile (Q1) and third quartile (Q3):
 Q1: This is the value below which 25% of the data lies. Use the formula:
Q1 = ((n+1)/4)th position.
 Q3: This is the value below which 75% of the data lies. Use the formula:
Q3 = (3(n+1)/4)th position.
 Q1: Here, n=10. Value at ((10+1)/4)=Value at 2.75th position. Interpolating
between the 2nd and 3rd values (15 and 18) = 15+(.75X(18-15))
=15+2.25 = 17.25
CONT…
 Q3: Here, n=10. Value at (3(10+1))/4)=Value at 8.25th position.
Interpolating between the 8th and 9th values (30 and 33) = 30+(.25X(33-
30)) =30+.75 = 30.75
 Calculate the quartile deviation:
 Quartile Deviation = (30.5 – 17.25)/2 = 13.5/2 =6.75
 Final Answer:
 The quartile deviation is 6.75.
MEAN DEVIATION
 The mean deviation (also called the average deviation) is a measure of
dispersion that represents the average distance of all data points from the
mean or median of the dataset. It provides insight into how spread out the
data values are around a central point.
 Formula for Mean Deviation:
CONT…
 Key Points:
 Absolute values are used in mean deviation to avoid negative
deviations canceling out positive deviations.
 The mean deviation can be calculated using either the mean or the
median as the central point. For datasets with skewed distributions, the
median is often preferred.
 It's simple to compute and provides a measure of variability that's easy to
interpret.
PROBLEM
 EXAMPLE: The ages of 6 students in a class are: 10, 12, 14, 16, 18, and
20. Calculate the mean deviation about the mean.
 SOLUTON:
 Find the mean (xˉ):
 (10+12+14+16+18+20)/6 = 90/6 =15
 Find the deviations from the mean (∣xi−xˉ∣): Calculate the absolute
differences for each data point and sum them:
 (|10-15|)+(|12-15|)+(|14-15|)+(|16-15|)+(|18-15|)+(|20-15|) =
5+3+1+1+3+5=18
 Calculate the mean deviation:
 Mean Deviation = 18/6 = 3
 Final Answer:
COEFFICIENT OF VARIATION
 The coefficient of variation (CV) is a statistical measure of relative
variability. It expresses the standard deviation of a dataset as a percentage of
its mean, making it a dimensionless number that allows for easy comparison
between datasets with different units or scales.
 Formula for Coefficient of Variation:
 Coefficient of Variation (CV)= (Standard Deviation/Mean)×100
 Where:
 Standard Deviation measures the dispersion or spread of the data.
 Mean represents the central value of the dataset.
CONT…
 Key Points:
 Interpretation:
 A higher CV indicates greater relative variability in the data.
 A lower CV indicates the data points are more consistent relative to the mean.
 Applications:
 Commonly used in fields like finance, engineering, and quality control to
compare risk or variability across datasets.
 For example, in finance, CV is used to compare the risk of different investments.
 Important Note:
 CV is meaningful only for data measured on a ratio scale (where the
zero point is absolute and meaningful).
PROBLEM
 EXAMPLE: The monthly incomes (in Rs.) of 5 individuals are: 40,000,
42,000, 38,000, 50,000, and 45,000. Calculate the coefficient of
variation.
 SOLUTION:
 Find the mean (xˉ):
 Mean = (40000+42000+38000+50000+45000)/5 = 215000/5 = 43000
 Find the standard deviation (s):
 Sum the squared deviations:
 ∑{(40000-43000)2 + (42000-43000)2 + (38000-43000)2 + (50000-43000)2 +
(45000-43000)2 } = 88,000,000
CONT…
 Hence, Variance S2 = (sum of square)/(n-1) =
88,000,000/4 = 22,000,000
 Standard Deviation = √22,000,000 ≈4690.49
 Calculate the coefficient of variation (CV):
 CV = (σ/mean) X 100 = (4690.49/43000)X100 = 10.91%Final Answer:
 The coefficient of variation (CV) is approximately 10.91%.
INTERQUARTILE RANGE (IQR)
 The Interquartile Range (IQR) is a measure of statistical dispersion that
shows the range within which the middle 50% of the data lies. It is the
difference between the third quartile (Q3) and the first quartile (Q1).
 Formula for IQR:
 IQR = Q3 - Q1
 Where:
 Q1 (First Quartile): The value below which 25% of the data lies.
 Q3 (Third Quartile): The value below which 75% of the data lies.
CONT…
 Key Points:
 The IQR is not influenced by outliers or extreme values, making it a
robust measure of spread.
 It divides the dataset into four equal parts, focusing on the central 50%.
 It's often used to detect outliers. Any data point falling below Q1- 1.5X
IQR or above Q3+1.5×IQR is considered an outlier.
PROBLEM
 EXAMPLE: The following dataset represents the test scores of 10 students:
40, 42, 45, 50, 53, 57, 60, 65, 68, 70. Calculate the Interquartile Range
(IQR).
 SOLUTION:
 Arrange the data in ascending order: The data is already arranged: 40,
42, 45, 50, 53, 57, 60, 65, 68, 70.
 Find the first quartile (Q1): Q1 is the median of the lower half of the data
(excluding the overall median). The lower half is: 40, 42, 45, 50, 53. Median
of the lower half (Q1) = value at the 3rd position = 45.
 Find the third quartile (Q3): Q3 is the median of the upper half of the data
(excluding the overall median). The upper half is: 57, 60, 65, 68, 70. Median
of the upper half (Q3) = value at the 3rd position = 65.
CONT…
 Calculate the Interquartile Range (IQR): IQR=Q3−Q1=65−45=20
 Final Answer:
 The Interquartile Range (IQR) is 20.
QUARTILES

 Definition: Quartiles are values that divide a data set into four equal parts,
each representing 25% of the data. They help summarize data distribution
and identify its spread.
 Q1 (First Quartile): The value below which 25% of the data lies (25th
percentile).
 Q2 (Second Quartile): The median of the data set, dividing it into two
halves (50th percentile).
 Q3 (Third Quartile): The value below which 75% of the data lies (75th
percentile).
 Interquartile Range (IQR): The difference between Q3 and Q1, used to
measure the spread of the middle 50% of the data.
PROBLEM
 Consider the data set: 5, 7, 8, 12, 15, 18, 22, 24, 30
 Steps to calculate the quartiles:
 Order the data (already sorted here).
 Find Q2 (Median): The middle value is 15.
 Find Q1 (First Quartile):
 Data below the median: 5, 7, 8, 12
 Median of this subset: 7.5
 Find Q3 (Third Quartile):
 Data above the median: 18, 22, 24, 30
 Median of this subset: 23
 Quartiles: Q1 = 7.5, Q2 = 15, Q3 = 23 Interquartile Range (IQR) = Q3 -
Q1 = 23 - 7.5 = 15.5
QUARTILE VALUES FOR
CONTINUOUS VARIABLES
 Quartile Formula
 For any quartile Qk (where k is 1, 2, or 3):
 Qk=L+(kN/4−F)×h/f
 Here’s what the symbols mean:
 L: The lower boundary of the quartile class.
 N: Total number of observations.
 F: Cumulative frequency of the class before the quartile class.
 f: Frequency of the quartile class.
 h: Class width (upper boundary - lower boundary of the quartile class).
 k: Quartile number (Q1, for Q2, and Q3).
DECILES
 Definition
 Deciles are points that split a data set into 10 equal sections, each containing
10% of the data. They are similar to quartiles, except that instead of dividing
the data into four parts, deciles divide it into ten parts.
 D1 (First Decile): The value below which 10% of the data lies (10th
percentile).
 D2 (Second Decile): The value below which 20% of the data lies (20th
percentile), and so on...
 D9 (Ninth Decile): The value below which 90% of the data lies (90th
percentile).
PROBLEM

 Consider this data set: 5, 10, 12, 18, 22, 30, 35, 40, 50, 55
 Step-by-Step Calculation:
 Sort the Data: The data is already sorted in ascending order.
 Formula for Decile: For a specific decile Dk:
 D k=(k⋅(N+1)/10)th value
 where N is the number of observations, and k is the decile number.
 D9 = (9(10+1)/10)th value. D9 = 9.9t h position lies between 50 & 55
 D9 = 50 + .9(55-50) = 50+4.5 = 54.5
DECILE VALUES FOR
CONTINUOUS VARIABLES
PERCENTILE

 Percentiles are statistical measures that divide a dataset into 100 equal parts.
Each percentile represents a value below which a certain percentage of
observations fall. For example, the 90th percentile means that 90% of the data
points are below that value.
 They are widely used in education, finance, health, and other fields to compare
individual values against a broader dataset. In grouped data, percentiles help in
understanding distribution patterns and identifying extremes or trends.
FORMULA FOR DISCRETE
VARIABLES
 Pk=(k⋅(N+1)/100)th value
 P50 = MEDIAN
 DATA SET: 10,23, 28, 29, 30
 MEDIAN =28
 P50 = (50(6)/100)th value = 3rd position = 28
PROBLEM
 From the following data calculate Q1, Q2, Q3, Q4, D6, , and P7

MARKS NO. OF STUDENTS


Less than 10 5
Less than 20 13
Less than 30 20
Less than 40 32
Less than 50 60
Less than 60 80
Less than 70 90
Less than 80 100
SOLUTION

COMPUTATION OF MEDIAN, QUARTILES, DECILE, PERCENTILE

CLASS CLASS FREQUENCY (f) CUMMULATIVE FREQUENCY


BOUNDARIE
S
Less than 10 BELOW 9.5 5 5
10-19 9.5-19.5 13-5 = 8 13
20-29 19.5-29.5 20-13 = 7 20
30-39 29.-39.5 32-20 = 12 32
40-49 39.5-49.5 60-32 = 28 60
50-59 49.5-59.5 80-60 = 20 80
60-69 59.5-69.5 90-80 =10 90
70-79 69.5-79.5 100 – 90 = 10 100
CONT…
I. MEDIAN: N/2 = 100/2 = 50 Lies in 39.5-49.5
MEDIAN = 39.5 + 10/28(100/2 – 32) = 45.93
II. QUARTILES Q1,Q2,Q3,Q4
Q1: N/4 = 100/4 =25 Lies in 29.5-39.5, hence Q1 = 29.5 + 10/12(100/4 -20) =
33.67
Q2 = MEDIAN = 45.93
Q3: 3N/4 = 300/4 =75 Lies in 49.5-59.5, hence Q3 = 49.5 + 10/20(300/4 – 60) =
57.0
III DECILE: N/10 = 100/10 = 10
D7: 7N/10 = 700/10 = 70 Lies in 49.5-59.5, hence D7 = 49.5 + 10/20 (700/10- 60)
= 54.5
IV. PERCENTILE: N/100 = 100/100 = 1
P70 = 70N/100 = 7000/100 = 70 Lies in 49.5-59.5, hence P70 = 49.5
+10/20(7000/100 – 60) = 54.5

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy